This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 



Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 



BLACK BORDERS 

TEXT CUT OFF AT TOP, BOTTOM OR SIDES 
FADED TEXT 
ILLEGIBLE TEXT 
SKEWED/SLANTED IMAGES 
COLORED PHOTOS 

BLACK OR VERY BLACK AND WHITE DARK PHOTOS 
GRAY SCALE DOCUMENTS 



IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



THJS PAGE BUVMK (ysm 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 




PCX 

INTERNATIONAL APPUCATION PUBUSHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) Intermtioiua Pfttcnt Clsniflcmtioa ^ : 

C07K 14A)0, C12N 15/00, CMP 21/00, 
AOIK «7/00 



Al 



(11) Inteniationa] PuMkmtlon Number: 
(43) IntenatloiuU PabOcatkm Dmte: 



WO 9fi/20951 
11 July 1996 iUJ07S6) 



(21) Intcrmtkuiftl AppBatioa Number; PCr/US95/16982 

(22) Intematkmal FQlng Dste: 29 December 1995 (29.1Z95) 



(30) Priority Data: 

OS/366,083 



29 December 1994 (29.12M) US 



(71) Applicant (for aU designated States except VS)i MASSA- 

(aiUSETTS INSTITUTE OFTECHNOLOGY [USAJSJ; 77 
Mastachusettt Avemx, Ctoibiidge, MA 02139 (US). 

(72) Inrentort; and 

(75) Inraiton/Appikanti Obr C;5 onfy;: POMERANTZ, JoeU L. 
[US/US]; 287 Harvard Street #25* Cambiidge. MA 02139 
(US). SHARP, Phmip, A. [USAJS]; 36 Fairmont Avenue, 
Newton, MA 02158 (US). PABO, CazU O. [USOJS]; 18 
Wcldon Road, Newton, MA 02158 (US). 

f74) Afcnt: BERSTEIN, David. U Ariad Pharmaceuticals. Inc.. 26 
Landsdowne Sueet. Cambridge, MA 02139 (US). 



(81) Designated States: AL. AM, AT, AU, A2; BB, BG. BR, BY. 
(X CH, CN, CZ, DE, DK, EE, ES, Fl, GB. OE. HU, IS, 
JP, KE, KG, KP, KR. KZ, LK, LR, LS, LT, LU. LV, MD, 
MO. MK. MN. MW, MX. NO, NZ, PU PT. RO. RU. SD, 
SE, SG. SI, SK. TF, TM, TT. UA, UG, US, UZ, VN, ARIPO 
psteot (KE, LS, MW, SD, SZ, IK3), European patent (AT. 
BE, CH, DE, DK, ES, FR. GB, GR, IE, FT, LU, MC, NL. 
PT. SE), OAPI patent (BF, BJ, CF, CG, a, CM, GA. GN, 
ML. MR, NE. SN. TD, TG). 



Pubiisbed 

With international search report. 

Before the expiration cf the time Omit for amending the 
claims and to be republished in the event ofAe receipt cf 
amendments. 



(54) Title: CHIMERIC DNA-BINDING PROTEINS 
(57) Abstract 

CSiimeric proteins containing composite DNA-binding regions are disclosed together with DNA constructs encoding them, 
compositions containing them and applications in whidi diey are useful. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States 
^ipUcations under the PCT. 



AM 


Afmcali 


AT 




AU 


AattnUft 


BB 


Bntndn 


BE 




BP 


BioUmPuo 


BG 




El 


Bmk 


BR 


Bmfl 


BY 




CA 


Cnada 


cr 


CestnJ Afrkio RepobUc 


CG 


Congo 


CH 




a 




CM 


CncfooD 


CH 




CS 


Ciechotkvtka 


CZ 


CttdiRepoblk 


DB 


Gnny 


DK 




EX 


BMonift 


KS 




n 


Fblad 


FR 


Prmet 


GA 


Gta 



to the PCT cm the front pages 



GB 


UlUMd KlQfdolD 


GE 


Geoftia 


GN 


Gofaea 


GR 


Gnooc 


HU 


HtBpry 


n 


trefaDd 


IT 


Iu}y 


JP 




KE 


Kenya 


KG 




KP 


Ocaoottk Pec^'» RepubGc 




of Kom 


SR 


Republic of Korea 


KZ 




U 




LK 


Sri Lanka 


LR 


Ubota 


LT 


LiitfaiiBilia 


LU 


LjsiQnboQi| 


LV 


Ucvla 


MC 




MD 


Rcpoblic of Moldova 


M 


Madataics 


ML 


MaU 


MN 


Moofotia 


MR 





pamphlets publishing international 



MW 


Malawi 


MX 


Mexico 


HE 


Ntgrr 


NL 


Ndbcflaxidi 


NO 


Nofway 


HZ 


Hew Zealand 


PL 


Poland 


PT 


Poctucal 


RO 


Rooiaiua 


RU 


Roaaiaa rHfaririno 


SD 


Sodas 


SE 


Swodeo 


SG 


Siagapoffo 


SI 


SknrcBia 


5K 


Slovalda 


SN 


SCDOStl 


sz 


SwazQaad 


TO 


Chad 


TG 


TofO 


TJ 


Titpkitfan 


TT 


-nWdad aial Tobaco 


UA 


Utamine 


UG 


Upada 


xm 


Uafaad SoEct of Amcfica 


vz 




VN 


Viet Nam 



wo 96/20951 



PCTAJS9S/16982 



5 

Chimeric DNA-Binding Proteins 

Government Support 

10 A portion of the work described herein was supported by grants PO1-CA42063, CDR- 

8803014 and P30-CA14051 from the Public Health Service/National Institutes of Health, 
National Science Foxmdation and National Cancer Institute, respectively. The U.S. Government 
has certain rights in the inventioru A portion of the work described herein was also supported by 
the Howard Hughes Medical Institute. 

15 

Background of the Invention 

DNA-binding proteiixs, such as transcription factors, are critical regulators of gene 
expressioxu For example, transcriptional regulatory proteins are knovm to play a key role in 
cellular signal transduction pathways which convert extracellular signals into altered gene 

20 expression (Curran and Franza, Cell 55395-397 (1988)). DNA-binding proteins also play critical 
roles in the control of cell growth and in the expression of viral and bacterial genes. A large 
number of biological and diiucal protocols, including among others, gene therapy, production of 
biological materials, and biological research, depend on the ability to elicit specific and high- 
level expression of genes encoding RNAs or proteins of therapeutic, commercial, or experimental 

25 value. Such gene expression is dependent on protein-DNA interactions. 

Attempts have been made to change the specificity of DNA-binding proteins. Those 
attempts rely primarily on strategies involving mutagenesis of these proteins at sites important 
for DNA-recognition (Rebar and Pabo, Science 262:671-673 (1994), Jamieson et al. Biochemistry 
22:5689-5695 (1994), Suckow et «/., Nucleic Acids Research 22(12):2198-2208 (1994)). This 

30 strategy may not be efficient or possible with some DNA-binding domains because of limitations 
imposed by their three-dimensional structure and mode of docking to DN A. In other cases It may 
not be sufficient to achieve important objectives discussed below. Therefore, it is desirable to 
have a strategy which can utilize many different DNA-binding domains and can combine them 
as required for DNA recognition and gene regulation. 

35 

Simunary of the Invention 

This invention pertairts to diimeric proteins which contain at least one composite DNA- 
binding region and possess novel nucleic add binding specificities. The chimeric proteins 
recognize nucleotide sequences (DNA or RNA) spaiming at least 10 bases and bind with high 
40 affixuty to oligonucleotides or polynucleotides containing such sequences. (It should be imderstood 
that the nudeotid sequences recogruzed by the chimeric proteins may be RNA or DNA, 



WO9W20951 PCT/US9S/16982 

although for the sake of simpUdty, the proteins of this invention are typically referred to as 
DNA-binding", and RNA too is understood, if not necessarily mentioned.) 

The terms "chimeric^ protein and "composite" domain are used to denote a protein or 
domain containing at least two component portions which are mutually heterologous in the sense 
5 that they do not occur together in the same arrangement in nature. More specifically, the 
component portions are not found in tfie same continuous polypeptide sequence or molecule in 
nature, at least not in the same order or orientation or with the same spadrig present in tiie 
dumeric protein or composite domain. 

As discussed in detail bdow, a variety of component DNA-binding polypeptides known 
10 in the art are suitable for adaptation to the practice of this invenHon. The chimeric proteins 
contain a composite region comprising two or more component DNA-binding domains, joined 
together, either directly or through one amino add or through a short polypeptide (two or more 
amino adds) to form a continuous polypeptide. Additional domair\s with desired properties can 
optionally be induded in the chimeric proteins. For example, a dumeric protein of this invention 
15 can contain a composite DNA-binding region comprising at least one homeodomain, such as the 
Oct-1 homeodomain, together with a second polypeptide domain which does not occur in nature 
identically linked to that homeodomain. Alternatively, the composite DNA-binding domain 
can comprise one or more zinc finger domains such as zinc finger 1 and/or finger 2 of Zif268, 
together with a second polypeptide domain which does not occur in nature linked to that zinc 

20 finger domain(s). 

A number of specific examples examined in greater detail below involve chimeric 
proteins containing a composite DNA-binding region comprising a homeodomain and one or two 
zinc finger domains. In one embodiment, the chimeric protein is a DNA-binding protein 
comprising at least one homeodomain, a polypeptide linker and at least one zinc finger domain. 
25 Such a chimeric protein is exemplified by a composite DNA-binding region containing zinc finger 
1 or zinc finger 2 of Zif268, an amino add or a short (2-5 amino add residue) polypeptide, and the 
Ocl-1 homeodomain. Another example is a dumeric protein containing a composite DNA-binding 
region comprising zinc fingers 1 and 2 of Zif268, a short linker, sudi as a glycine-glydne-arginine- 
arginine polypeptide, and the Oct-1 homeodomain. The latter dumeric protein, designated 
30 ZFHDl, is described in detail below. Other illustrative composite DNA-binding regions include 
those comprising the Octl POU spedfic domain (aa 268 - 343) and its own flexible linker (aa 344- 
366) fused to the amino terminus of ZFHDl and ZFHDl hised at its carboxy terminus to Zif268 
fingers 1 and 2 (aa 333-390) via the Octl flexible linker. 



2 



wo 96^0951 



PCT/US95/16982 



In ther embodiments, the chiiheric protein comprises a composite DNA-binding region 
contaiiung a chimeric zinc finger-basic*h lix-ioop*helix protein. One such chimeric protein 
comprises fingers 1 and 2 of Zif268 and the My D bHLH region, joined by a polypeptide linker 
which spax\s approximately 95 A between the carboxyl-terminal region of finger 2 and the 

5 amino-temunal region of the basic region of the bHLH domain. 

In another embodiment the chimeric protein comprises a composite DNA-binding region 
containing a zinc finger-steroid receptor fusioru One such chimeric protein comprises fingers 1 and 
2 of Zif268 and the DNA-binding domains of the glucocorticoid receptor, joined at the carboxyl- 
terminal region of finger 2 and the amino-terminal region of the DNA-binding domain of the 

10 glucocorticoid receptor by a polypeptide linker which spans approximately 7.4 A. 

As will be seen, one may demonstrate experimentally the selectivity of binding of a 
chimeric protein of this invention for a recognized DNA sequence. One aspect of tfiat specificity 
is that the chimeric protein is capable of binding to its recognized nucleotide sequence 
preferentially over binding to constituent portions of that nucleotide sequence or binding to 

15 different nucleotide sequences. In that sense, the chimeric proteins display a DNA-binding 
specificity which is distinct from that of each of the component DNA-binding domains alone; 
that is, they prefer binding the entire recognized nucleotide sequence over binding to a DNA 
sequence containing only a portion thereof. That specificity and selectivity means that the 
practitioner can design composite DNA-binding regions incorporating DNA-binding domains of 

20 known nucleotide binding specificities with the knowledge that the composite protein will 
selectively bind to a corresponding composite nucleotide sequence and will do so preferentially 
over the constituent nucleotide sequences. 

These chimeric proteins selectively bind a nucleotide sequence, which may be DNA or 
RNA, spaxuung at least 10 bases, preferably at least 11 bases, and more preferably 12 or more 

25 bases. By way of example, one can experimentally demonstrate selective binding for a 12-base 
pair nucleotide sequence using the illustrative ZFHDl composite DNA-binding domain. 
Typically one will obtain binding to the selected DNA sequence with a Kd value of about 10^ or 
better, preferably 10^ or better and even more preferably 10"^® or better, Kd values may be 
determined by any converucnt method. In one sudimethod one conducts a series of conventional 

30 DNA binding assays, e.g, gel shift assays, varying the concentration of DNA and determining 
the DNA concentration which correlates to half-maximal protein binding. 

The imdeotide sequence specificity of binding by chimeric proteins of this invention, 
illustrated by proteins comprising the peptide sequence of ZFHDl, renders them useful in a 
number of important contexts because their DNA-binding properties are distinct from those of 



3 



W 9<!a0951 pcr/ua«/i69K 

known proteins. Such uses include the selective transcription, repression or inhibition of 
transcription, marking, and cleavage f a target nucleotide sequence. The chimeric proteins prefer 
to bind to a specific nucleic add sequence and, thus, marie cleave or alter expresaon of genes 
linked to or controlled by a nucleotide sequence containing the recognized nuddc add sequence. 
5 Preferably, the chimeric proteins do not to a significant extent bind the DNA botmd by the 
component domains of the composite DNA-binding region, and, thus, do not mark, deave or alter 
normal cellular gene expression other than by design. 

In one application, the chimeric proteins bind a sdected nudeic add sequence within a 
DNA or RNA and, as a result, mark or flag the selected DNA or RNA sequence, whid\ can be 
10 identified and/or isolated from the DNA using known methods. In this respect, the chimeric 
proteins act in a manner similar to restriction enzymes, in that Ihey recognize DNA or RNA at a 
sdected nudeic acid sequence, tii\is marking that sequence where ever it occurs in DNA or RNA 
with which the diimeiric proteins are contacted. Unlike restriction enzymes, chimeric DNA- 
binding or RNA-binding proteins do not cut or fragment the DNA or RNA at the nuddc adds 
15 they recognize. Chimeric proteins used for this purpose can be labelled, e.g., radioactively or 
with an affinity ligand or epitope tag such as GST, and thus, the location of DNA or RNA to 
which they bind can be identified easily. Because of the binding specifidty of the chimeric 
proteins, DNA or RNA to which binding occurs must indude either the nucleotide sequence which 
the chimeric proteins have been designed to recognize or the nudeotide sequences recognized by 
20 the component DNA-binding domains. Optimally, the chimeric protein will not efficiently 
recognize the nudeotide sequence recognized by the component DNA-binding domains. Standard 
metf\ods, such as DNA doning and sequencing, can be used to determine the nudeotide sequence to 
which the chimeric protein is bound. 

In view of the ability of a composite DNA-binding region to fold and fimction in an 
25 autonomous manner, chimeric proteins of the various embodiments of this invention may further 
comprise one or more additional domains, induding for example a transcription activation 
domain, a transcription repressing domain, a DNA-dcaving domain, a ligand-binding domain, or 
a protein-binding domain. 

Such a chimeric protein which contains a transcription activation domain constitutes a 
30 chimeric transcription factor which is capable of activating the transcription of a gene linked to 
a DNA sequence recognized {r.tf., sdectively bound) by the chimeric protein. Various 
transcription activation domains are known in the art and may be used in chimeric proteins of 
this invention, induding the Herpes Simplex Virus VP16 activation domain and the NF-kB p65 
activation domain which are derived from nahirally occurring transcription factors. One class of 



4 



wo 96^0951 



PCTAJS95/16982 



such transcription factors comprise at least one conqxisite DNA-binding region^ e.g, one containing 
at least one homeodomain and at least one zinc finger domain (such as the peptide sequence of 
2THD1), and at least one additional domain capable f activating transcription f a gene linked 
to a DNA sequence to which the transcription factor can bind. These are illustrated by ^e 

5 ZFHD1-VP16 and ZFHDl-p65 chimeras discussed below. 

Chimeric proteins of this invention also indude those which are capable of repressing 
transcription of a target gene linked to a nucleotide sequence to which the chimeric proteins bind. 
Such a chimeric protein functions as a somewhat classical repressor by binding to a nucleotide 
sequence and blocking, in whole or part, the otherwise normal functioning of that nucleotide 

10 sequence in gene expression, binding to an oidogenous transcription factor. Other chimed 
proteins of this invention which are capable of repressing or inhibiting transcription of a target 
gene linked to a nucleotide sequence to >^^uch the chimeric protein binds include chimeric 
proteins containing a composite DNA-binding region, characteristic of all chimeric proteins of 
this invention, and an additional domain, such as a KRAB domain or a 5sn-6/TUP-l or Kruppel- 

15 family suppressor domain, capable of inhibiting or repressing the expression of the target gene in 
a cell. In either case, binding of the chimeric protein to the nucleotide sequence linked to the 
target gene is associated with decreased transcription of the target gene. 

Chimeric proteins of this invention also include those which are capable of cleaving a 
target DNA or RNA linked to a nucleotide sequence to which the chimeric proteins bind. Sud\ ' 

20 chimeric proteins contain a composite DNA*binding region, characteristic of all chimeric 

proteins of this invention, and an additional domain, such as a Fokl domain, capable of cleaving 
a nucleic add molecule. Binding of the chimeric protein to the recognition sequence liiUced to the 
target DNA or RNA is associated with deavage of the target DNA or RNA. 

Chimeric proteins of this invention further indude those which are capable of binding to 

25 another protein molecule, e,g,, for use in conducting otherwise conventional two-hybrid 

experiments. See e.g.. Fields and Song, US Patent 5,283,173 (F^ruary 1, 1994). In addition to the 
characteristic composite DNA-binding region, proteins of this embodimwt contain an additional 
domain whidi is, or may be, capable of binding to anottier protein, known or unknown. In such 
experiments, tiie chimeric protein containing the composite DNA-binding region replaces the 

30 GAL4-contain2ng fusion protein in the 2-hybrid system and the nudeotide sequence recognized by 
our chimeric protein replaces the GAL4 binding sites linked to the reporter gene 

Chimeric proteins of this invention further indude those which further contain a ligand- 
binding domain permitting ligand-regulated manifestation of biological activity. Chimeric 
DNA-binding proteins of this aspect of the invention can be complexed or "dimerized" with 



5 



W W20951 PCT/USW16W2 

other ligand-binding fusion proteins by the presence of an appropriate dlmerizing ligand. 
Examples f sudi chimeric proteins include proteins containing a diaracteristic composite DNA- 
binding region and a ligand-binding domain such as an immunophilin like FKBF12. The divalent 
ligand, FK1012, for example, is capable of binding to a chimeric protein of this invention which 
5 also contains one or more FKBP domains and to another FKBP<ontaining protein, including a 
fusion protein containing one or more copies of FKBP linked to a transcription activation domain. 
See Spencer, DJ^., et al 1993- Science. 262:1019-1024, and PCT/US94/01617. Cells expressing 
such fusion proteins are capable of dimerizer-dependent transcription of a target gene linked to a 
nucleotide sequence to to which the DNA-binding chimera is capable of binding. 
10 This invention further encompasses DNA sequences encoding the chimeric proteins 

containing a composite DNA-binding region. Such DNA sequences include, among others, those 
whidi encode a chimeric protein in which the composite DNA-binding region contains a 
homeodomain covalcntly linked to at least one zinc finger domain, exemplified by chimeric 
proteins containing the peptide sequence of ZFHDl. As should be clear from the preceding 
15 discussion, the DNA sequence nuiy encode a chimeric protein which hirther comprises one or more 
additional domains including, for instance, a transcription activation domain, a transcription 
repressing domain, a domain capable of cleaving an oligonucleotide or polynucleotide, a domain 
capable of binding to another protein, a ligand-binding domain or a domain useful as a detectable 
label. 

20 This invention further encompasses a eukaryotic expression construct containing a DNA 

sequence encoding the chimeric protein operably linked to expression control elements such as 
promoter and enhancer elements permitting expression of the DNA sequence and production of the 
chimeric protein in eukaryotic ceils. One or more of those expression control elements may be 
inducible, permitting regulated expression of the DNA encoding the chimeric protein. The 

25 expression control elements may be tissue-specific or cell-type-spedf ic, permitting preferential 
or selective expression of the chimeric protein in a cell-type or tissue of particular interest An 
example of a eukaryotic expression vector of this invention is the plasmid pCGNN ZFHDl- 

FKBPX3 (ATCC No. ) which is capable of directing the expression in mammalian cells of a 

fusion protein containing a ZFHDl composite DNA-binding region linked to three FKBP12 

30 domains, discussed in greater detail below. 

Using DNA sequences encoding the chimeric proteins of this invention, and vectors 
capable of directing their expression in eukaryotic cells, one may genetically engineer cells for a 
number of important uses. To do so, one first provides an expression vector or constnict for directing 
the expression in a eukaryotic ceU of the desired chimeric protein and then introduces the vector 



6 



PCrAJS»5/169W 

DNA into the cells in a manner permitting expression of the introduced DNA in at least a portion 
of the cells. One may use any of the various methods and materials for introducing DNA into 
cells for heterd gousgeneexpression^manyofwhicharewellknown. A variety of such 
materials are commerdaliy available. 

5 In some cases the target gene and its linked nucleotide sequence specifically recognized by 

tfie chimeric protein aie endogenotis to, or ottierwise already present in, the engines 
other cases, DNA comprising the target gene and / or ti\e recognized DNA sequence is not 
endogenous to the cells and is also introduced into the cells. 

The various DNA constructs may be introduced into ceUs maintained in culture or may be 

10 administered to whole organisms, including hvonans and other animals, for introduction into cells 
in prra. A variety of methods and materials to effect the delivery of DNA into animals for the 
introduction into cells are known in the art 

By diese mettiods, one may genetically engineer cells, whether in culture or in vivo, to 
express a chimeric protein capable of binding to a DNA sequence linked to a target gene within 

15 the cells and marking the DNA sequence, activating transcr^tion of the target gene, repressing 
tnmscription of the target gene, deaving the target gene, etc. Expression of the chimeric protein 
may be inducible, ceD-type-spedfic, etc., and ttie biological effect of the chimeric protein may be 
ligand-dependent, all as previously mentioned. 

This invention further encompasses genetically engineered cells contairung and/ or 

20 expressing any of the constructs described herein, particularly a construct encoding a protein 
comprising a composite DNA-binding regioa induding prokaryotic and eucaryotic cells and in 
particukir, yeast, worm, insect, mouse or other rodent, and other mammalian cells, induding 
human cells, of various types and lineages, whether frozen or in active growth, whether in 
culture or in a whole organism containing them. Several examples of sudi engineered cells are 

2S provided in the Examples which follow. Those cells may further contain a DNA sequence to 
whidi the encoded chimeric protein is capable of binding. Likewise, this invention encompasses 
any non-human organism containing such geneticaDy engineered cells. To illustrate this aspect of 
the invention, an example is provided of a mouse containing engineered cells expressing, in a 
ligand-dependent manner, an introduced target gene linked to a ntideotide sequerux recogruzed by 

30 a chimeric protein contairung a composite DNA-binding region. 

The foregoing materials and methods permit one to mark a DNA sequence recognized by 
the chimeric protein as weD as to actuate or inhibit the expression of target gene or to deave the 
target gene. To do so, one first provides cells contairung and capable of expressing a first DNA 
sequence encoding a chimeric protein whidi is capable of binding to a second DNA sequence 



7 



WO96«0951 



PCT/US9S/16982 



linked to a target gene of interest also present within th cells. The chimeric protein is chosen for 
its ability to bind to and mark, cleave, actuate or inhibit transcription f, etc. the target gene. 
The cells are tiien maintained under conditions permitting gene expression and protein 
production. Again^ gene expression may be inducible or cell-type specific, znd the cells may be 

5 maintained in culture or within a host organism. 

Ihis invention may be applied to virtually any use for which recognition of specific 
nucleic add sequences is criticaL For instarure, the present invention is useful for gene regtilation; 
that is, the novel DNA-binding chimeric proteins can be used for specific activation or repression 
of transcription of introduced or endogenous gertes to control die productim of tfieir gene pnroducts, 

10 whether in cell culture or in whole orgarusms. In the context of gene therapy, it may be used to 
correct or compensate for abnormal gene expression, control the expression of disease-causing gene 
products, direct the expression of a product of a naturally occurring or engineered protein or RNA 
of therapeutic or prophylactic value, or to otherwise modify the phenotype of cells introduced 
into or present within an organism, including mammalian subjects, and in particular including 

15 hiunan patients. For instance, the invention may be \ised in gene therapy to increase the 

expression of a deficient gene product or decrease expression of a product which is overproduced or 
overactive. This invention may also be used to control gene expression in a transgenic organism for 
protein production. 

The chimeric proteins of the present invention can also be used to identify specific rare . 

20 DNA sequences, e.g., for use as markers in gene mapping. To identify a DNA sequence in a 

mixture, one provides a mixture containing one or more DNA sequences; contacts tfie mixture with 
a chimeric protein of this invention under conditioru permitting the specific binding of a DNA- 
binding protein to a recogruzed DNA sequerKe; and, determines the occurrence, amount and/ or 
location of any DNA binding by the chimeric protein. For example, the chimeric protein may be 

25 labeled with a detectable label or with a moiety permitting recovery from the mixture of the 
chimeric protein with any boimd DNA. Using such materials, cme may separately recover the 
chimeric protein and an boxmd t^A from the mixture and isolate the bound DNA from the 
protein if desired. 

Also, embodiments involving chimeric proteins containing a domain capable of cleaving 
30 DNA provide a new series of sequence-specific endonudease proteii«- Chimeric DNA-binding 
proteirw of tfie present invention can also l>e used to induce or stabilize loop formation in DNA or 
to bring together or hold together DNA sites on two or more different molecules. 



8 



PCT/DS95/16982 



Brief Description of the Drawings 

Hgiire lA-C illustrates selection by ZFHDl of a hybrid binding site from a pool f 
random oligonucleotides. Bgure lA is a graphic re|7resentation of the structure f the 21FHD1 
chimeric protein used to select binding sites. The underlined residues are from the Zif268-DNA 

5 and Oct-l-DNA crystal structures and correspond to die tennini used in computer modeling 
studies* The linker contains two glycines, which were included for flexibility and to help span 
the required distance between the termini of tiie domaiiu, and the two argiiunes that are present 
at positions -1 and 1 of the Oct*l homeodomaiiu A glutatiiione S-transferase domain (GST) is 
joined to tiie amino-tetminus of zinc finger 1. Figure IB shows ttie nucleic add sequences (SEQ ID 

10 NOS.: 1-16) of 16 sites isolated after four rounds of binding site selection. Tliese sequences were 
used to determine the consensus binding sequence (5-TAATTANGGGNG-3*, SEQ ID NO.: 17) of 
ZFHDl. Bgure IC shows the alternative possibilities for homeodoiruun binding configurations 
suggested by the consensus sequence; Mode 1 was determined to be the correct optimal 
configuration for ZFHDl. The letter IT at a position indicates that any nucleotide can occupy 

15 that position. 

Figure 2A-C is an autoradiograph illustrating the DNA-binding specificity of ZFHDl, 
the Oct-1 POU domain and the three zinc fingers horn Zi£268. Tlie probes used are listed at the 
top of eadi set of lanes, and the position of the protein-DNA complex is indicated by the arrow. 
Figure 3 is a graphic representation of the regulation of promoter activity in lAvo by 

20 ZFHDl. The expression vector encoded the ZFHDl protein fused to the carboxyl-terminal 81 
amino adds of VP16 (+ bars), and the empty expression vector Rc/CMV was used as control (- 
bars). Bar graphs represent the average of three independent trials. Actual values and standard 
deviation reading from left to right are: 1.00 ± .05, 330 ± .63; 0.96 ± .08, 42.2 ± 5.1; 0.76 ± .07, Z36 ± 
34; 1.22 ± .10, 4.22 ± 1.41. Fold induction refers to the level of normalized activity obtained with 

25 the ZFHD1-VP16 expression construct divided by that obtained with Rc/CMV. 

Figure 4. Panel A illustrates data demonstrating that fusion proteins containing ZFHDl 
linked to either a VF16 or p65 transcription activation domain activate transcription of a gene 
encoding secreted alkaline phosphatase (SEA?) linked to ZFHDl binding sites in HT1080 cells. 
Panel B illustrates data demorwtrating that fusion proteins containing three copies of the FKBP 

30 domain joined to the VP16 or p65 activation domains support FK1012-dependent transcription of 
a reporter gene (secreted alkaline phosphatase) linked to a binding site for the 2IFHD1 
composite DNA-binding doiruun present in the ZFHDl-FKBP(x3) fusion protein. Panel C 
illustrates data from an aiudogous experiment using a whoUy synthetic dimerizer in place of 
FK1012. 



9 



WO9(W0Ml PCT/US9Sn6982 

Figure 5 illustrates in schematic form a diimeric transcription factor of this invention 
containing a composite DNA binding domain and a transcription activati n domain, bound t its 
recognized DNA sequence. Also illustrated is a chimeric protein of this invention containing one 
or more FKBP domains, a cognate chimeric protein containing a FRAP FRB domain linked to a 
5 transcription activation domain, and a complex of tiiose two chimeras formed in the presence of 
the dxmerizcr, rapamydn, resulting in die clustering of the transcripticxud complex on a 
recognized DNA sequence. 

Figure 6 illustrates data demonstrating functional dimerizer-dependent expression of an 
hGH target gene resulting from complexation of the ZFHDl-FICBP(x3) fusion protein to a FRAP 
10 FRB-p65 fusion protein and binding of the complex to a 2THD1 binding site in engineered cells in 
whole aiumals. These data demonstrate that m vivo administration of a dimerizing agent can 
regulate gene expression in whole animals of secreted gene products from cells containing ttxe 
fusion proteins and a responsive target gene cassette. Human cells (2 x lO^) transfected with 
plasmids encoding transcription factors ZFHDl-FKBPx3 and FRB-p65 and a target gene directing 
15 the expression of human growth hormone (hGH) were injected into the skeletal muscles of nu/mi 
mice. Mice were treated vnfh the indicated concentration of rapamycin by tail vein injection. 
After 17 hours, serum hGH levels were determined by EUSA. Each point represents X±SEM (n=at 
least 5 per point). Control animals received either engineered cells without drug or dmg or 
10* Hg/kg) without engineered cells. 

20 

Detailed Description of the Invention 

This invention pertains to the design, production and use of chimeric proteins containing a 
composite DNA-binding region, e.g., to obtain constitutive or regulated expression, repression, 
cleavage or marking of a target gene linked to a nucleotide sequence recognized (f.f spedficaUy 

25 bound) by the chimeric DNA-binding protein. The composite DNA-binding region is a continuous 
polypeptide chain spanning at least two heterologous polypeptide portions representing 
component DNA-binding domains. The component polypeptide domains comprise polypeptide 
sequences derived from at least two different proteins, polypeptide sequences from at least two 
non-adjacent portions of the same protein, or polypeptide sequences which arc not found so linked 

30 in nature. 

The component polypeptide domains may comprise nahiraUy-occurring or non-naturaDy 
occurring peptide sequence. The dumeric protein may include more Own two DNA-binding 
domains. It may also include one or more linker regions comprising one or more amino add 
residues, or include no linker, as appropriate, to join the selected domains. The nucleic add 



10 



wo 96^0951 



PCTATS95/16982 



sequence Tecognized by the diimcric DNA-binding protein may indude aU or a portion of tite 
sequences bound by the component polypeptide domains. However, the chimeric protein displays 
a binding specificity that is distinct from the binding spedfxcity f its individual polypeptide 
components. 

5 The invention further involves DNA sequences encoding such chimeric proteins, the 

recombinant DNA sequences to which the chimeric proteins bind {ie., vMch are recognized by 
flie composite DNA-binding region), constructs containing a target gene and a DNA sequence 
which is recognized by the chimeric DNA-binding protein, and tfie use of these materials in 
applications w^ch depend upon specific recognition of a nucleotide sequence. Such composite 

10 proteins and DNA sequences which encode them are reconU^inant in tfie sense tiiat they contain at 
least two constituent portions which arc not otherwise found directly linked (covalently) 
together in nature, at least not in the order, orientation or arrangement present in the recombinant 
material Desirable properties of these proteins include high affinity for specific nucleotide 
sequences, low affinity for most other sequences in a complex genome (such as a mammalian 

15 geneome), low dissociation rates from specific DNA sites, and novel DNA recognition 

specificities distinct from those of known natural DNA-binding proteins. A basic principle of the 
design is the assenibiy of multiple DNA-binding domains into a single protein molecule that 
recognizes a long (spanning at least 10 bases, preferably at least 11 or more bases) and complex 
DNA sequence with high affinity presimiably through the combined interactions of the 

20 individual domains. A further l>enefit of this design is the potential avidity derived from 
multiple independent protein-DNA interactions. 

The practice of this invention generally involves expression of a DNA construct encoding 
and capable of directing the expression in a cell of flie diimeric protein containing the composite 
DNA-binding region and one or more optional, additional domains, as described below. Some 

25 embodiments also make use of a DNA construct containing a target gene and one or more copies of a 
DNA sequence to which the chimeric DNA-binding protein is capable of binding, preferably 
with high affinity and /or specificity. Some embodiments further involve one or more DNA 
constructs encoding and directing the expression of additional protdns capable of modulating the 
activity of the DNA-binding protein, e.g., in the case of chimeras containing ligand-binding 

30 domains which complex with one another in the presence of a dimerizing ligand. 

In one aspect of the invention, the chimeric proteins are transcription factors which may 
contain one or more regulatozy domains in addition to the composite DNA-binding region. The 
term "transcription factor* is intended to encompass any protein that regulates gene 
tnmscription, and includes regulators tfiat have a positive or a negative effect on transcription 



11 



WOlHWOWl PCr/US95«69W 

initiation or progression. Transciipti n factors may optionally contain ne or more regulatory 
doxnains*The term "regulatory domain" is defined as any domain whidi regulates transcription, 
and includes both activation domains and repression domains. The term "activation domain** 
denotes a domain in a transcription factor which positively regulates (tunw on or increases) the 
5 rate of gene transcriptioru The lerai "repression domain" denotes a domain in a traiucription 
factor which negatively regulates (turns off, inhibits or decreases) the rate of gene transcription. 
The nucleic add sequence boimd by a transcription factor is typically DNA outside the coding 
regiorv such as within a promoter or regulatory element region. However, suffidentiy tight 
binding to nucleotides at other locations, €.g., within the coding sequence, can also be used to 
10 regulate gene expression. 

Preferably the chimeric DNA bixvding protein binds to a corresponding DNA sequence 
selectively, Le., observably binds to that DNA sequence despite the presence of numerous 
alternative candidate DNA sequences. Preferably, binding of the chimeric DNA-binding protein 
to the selected DNA sequence is at least two, more preferably three and even more preferably 
15 more than four orders of magrutudc greater than binding to any one alternative DNA sequence, as 
may be measured by relative Kd values or by relative rates or levels of trai\scription of genes 
associated with the selected and any alternative DNA sequences. It is abo preferred that the 
selected DNA sequence be recognized to a substantially greater degree by the chimeric protein 
containing the composite DNA-binding region than by a protein containing only some of the 
20 individual polypeptide components thereof. Thus, for example, target gene expression is 
preferably two, more preferably three, and even more preferably more than four orders of 
n\agnitude greater in the presence of a chimeric transcription factor contairung a composite DNA- 
binding region than in the presence of a protein containing only some of the components of that 
composite DNA-binding regiorv 
25 Additional guidance for practicing various aspects of the invention, together with 

additional illustrations are provided below. 

1. Design of Composite DNA-binding Regions. Ea<h composite DNA-binding region 
consists of a continuous polypeptide region containing two or more component heterologous 
30 polypeptide portions which are individuaUy capable of recognizing binding to) spec^ 
nucleotide sequences. The individual component portions may be separated by a linker 
comprising one or more amino add residues intended to permit the simultaneous contact of eadi 
component polypeptide portion with the DNA target The combined action of the composite 
DNA-binding region formed by the component DNA-binding modules is thought to result in the 



12 



wo 96/20951 



PCrA7S9S/16982 



addition of the free energy decrement of each set of interacticms. The effect is t achieve a DNA- 
protein interaction of very high affinity, preferably with dissociation constant below IQ-^ 
more preferably below lO"**^ Ni even more preferably bel w lO"" M. This goal is ften best 
achieved by combining component polypeptide regions that bind DNA poorly on their own, that 

5 is with low affinity, insufficient for functional recognition of DNA under typical conditions in a 
mammalian cell Becaxase the hybrid protein exhibits affinity for the composite site several 
orders of magnitude higher than the affinities of the individual sub-domains for their subsites, 
ttie protein preferentially (preferably exclusively) occupies the "composite* site which 
typically comprises a nucleotide sequence spaxming the individual DNA sequence recognized by 

10 the individual component polypeptide portions of the composite DNA-binding region. 

Suitable component DNA-binding polypeptides for incorporation into a composite region 
have one or more, preferably more, of the following properties. They bind DNA as monomers, 
although dimers can be accommodated. They should have modest affinities for DNA, with 
dissociation constants preferably in the range of 10* to 10^ M. They should optimally belong to a 

15 class of DNA-binding domains whose structure and interaction with DNA are well imderstood 
and therefore amenable to manipulation. For gene therapy applications, they arc preferably 
derived from human proteins. 

A structure-based strategy of fusing known DNA-binding modules has been used to design 
transcription factors witfi novel DNA-binding specificities. In order to visualize how certain 

20 DNA-binding domains might be fused to other DNA-binding domains, computer modeling studies 
have been used to superimpose and align various protein-DNA complexes. 

Two criteria suggest which aligiunents of DNA-binding domains have potential for 
combination into a composite DNA-binding region (1) lack of collision between domains, and (2) 
consistent positioning of the carboxyl- and amino-terminal regions of the domains, ix., the 

25 domains must be oriented such that the carboxyl-teiminal region of one polypeptide can be joined 
to the amino-terminal region of tfte next polypeptide, either directiy or by a linker (indirectiy). 
Domains positioned such that otdy the two amino-terminal regions are adjacent to each other or 
only the two carboxyl-terminal regions are adjacent to each other are not suitable for inclusion in 
the chimeric proteins of the present invention. When detailed structural information about the 

30 protein-DNA complexes is not available, it may be necessary to experiment with various 
endpoints, and more biochemical work may be necessary to characterize the DNA-binding 
properties of the chimeric proteins. This optimization can be performed using known techniques. 
Virtually any domains satisfying the above-described criteria are candidates for inclusion in the 
chimeric protein. Alternatively, non-computer modeling may also be used. 



13 



W 96/20951 



PCTAJS95/16m 



2. Examples of suitabl component DNA-binding domains. DMA-binding domains vhAi 
appropiiat DNA binding properties may be selected from several different types of natural 
DMA-binding proteins. One class comprises proteins that normally bind DNA only in cor^unction 

5 with atixiliary DMA-binding proteins, usuaDy in a cooperative fashion, where both proteins 
contact DNA and each protein contacts the other. Examples of this class include the 
homeodomain proteins, many of which bind DNA with low affimty and poor specificity, but act 
with high levek of specificity in vivo due to interactions with partner DNA-binding proteins. 
One well-characterized example is the yeast alpha2 protein, which binds DNA oiJy in 

10 cooperation with another yeast protein Mcml. Anodier example is the hionan homeodomain 
protein Phoxl, which interacts cooperatively with the human transcription factor, serum 

response factor (SRF), 

The homeodomain is a highly conserved DNA-binding domain which has been found in 
hvmdreds of transcription factors (Scott et a/., Biodiim. Biophys, Acta 222:25-48 (1989) and 
15 Rosenfeld, Genes Dev. 5:897-907 (1991)). The regulatory function of a homeodomain protein 
derives from the spedfidty of its interactions with DNA and presumably with components of 
the basic transcriptional machinery, such as RNA polymerase or accessory transcription factors 
(Laughon, Biochemistry 2SI(^):11357 (1991)). A typical homeodomain comprises an 
approximately 61-amino add residue polypeptide chain, folded into three alhpha helices 

20 which binds to DNA. 

A second class comprises proteins in which the DNA-binding domain is comprised of 
multiple reiterated modules that cooperate to achieve high-affinity binding of DNA. An 
example is the C2H2 class of zinc-finger proteins, which typically contain a tandem array of 
from two or three to dozens of zinc-finger modules. Each module contains an alpha-helix capable 

25 of contacting a three base-pair stretch of DNA. TypicaUy, at least three zinc-fingers are 
required for high-affiruty DNA binding. Therefore, one or two zinc-fingers constitute a low- 
affiruty DNA-binding donuun with suitable properties for use as a component in this invention. 
Examples of proteins of the C2H2 dass indude TFIIIA, Zif268, GU, and SRE-2BP. (These and 
other proteins and DNA sequences referred to herein are well known in the art Their sources and 

30 sequences are known.) 

The zinc finger motif, of the type first discovered in transcription factor IIIA (Miller et 
al„ EMBO /* 4:1609 (1985)), offers an attractive framework for studies of trai\scription factors 
with novel DNA-binding spedfidties. The zinc finger is one of the most common eukaryotic 
DNA-binding motifs Qacobs, EMBO Ml:4507 (1992)), and this family of proteins can recognize a 



14 



W 96/20951 



PCTAJS95/16982 



diverse set f DNA sequences (Pavletich and Pabo, Science 261:1701 (1993)). Crystallographic 
studies of the 2i£268-DNA complex and ther zinc finger^DNA complexes show ttiat residu s at 
four positions within each finger make most f ^ base contacts, and diere has been some 
discussion about rules that may explain zinc finger-DNA recognition (Desjarlais and Berg, PNAS 
5 82:7345 (1992) and Wevit, Science 252:1367 (1991)). However, studies have also shown that zinc 
fingers can dock against DNA in a variety of ways (Pavletich and Pabo (1993) and Fairall et a!., 
Nflf«rff 266:483 (1993)). 

A tfurd general class comprises proteins that themselves contain multiple independent 
DNA-binding domains. Often, any one of these domains is insufficient to mediate high-affinity 

10 DNA recognition, and cooperatian with a covalently Hnked partiier domain is required. 
Examples include tfie POU class, such as Oct-1, Oct-2 and Pit-1, which contain both a 
homeodomaih and a POU-spedfic domain; HNFl, which is orgaiuzed similarly to the POU 
proteins; certain Pax proteins (examples: Pax-3, Pax-6), which contain both a homeodomain and 
a paired box /domain; and XXX, which contains a homeodomain and multiple zinc-fingers of the 

IS C2H2 class. 

From a structural perspective, DNA-binding proteins containing domaii\s suitable for use 
as polypeptide components of a composite DNA*binding region may be classified as DNA- 
binding proteins with a helix-tum-helix structural design, including, but not limited to, MAT al, 
MAT a2, MAT al, Antennapedia, Ultrabithorax, Engrailed, Paired, Fushi tarazu, HOX, Unc86, 

20 and the previously noted Octl, Oct2 and Pit; zinc finger proteins, such as Zi£268, SWI5, Kriippel 
and Hunchback; steroid receptors; DNA-binding proteins with the helix-loop-helix structural 
design, such as Daughterless, Achaete-scute (13), MyoD, E12 and E47; and other helical motifs 
like the leudne-zipper, which includes GCN4, C/EBP, c-Fos/c-Jim and ]tmB. The amino add 
sequences of the component DNA-binding domains may he naturally-occurring or non-naturally- 

25 occurring (or modified). 

The choice of component DNA-binding domains may l>e influenced by a number of 
considerations, induding the spedes, system and cell type which is targeted; the feasibility of 
incorporation into a chimeric protein, as may be shown by modeling; and the desired application 
or utility. The dioice of DNA-binding domains may also be iitfluenced by the individual DNA 

30 sequence spedfidty of the domain and the ability of the domain to interact with other proteins 
or to be influenced by a particular cellular regulatory pathway. Preferably, the distance between 
domain termini is relatively short to fodlitate use of the shortest possible lirdcer or no linker. 
The DNA-binding domains can be isolated from a naturally-occurring protein, or may be a 
synthetic molecule based in whole or in part on a naturaUy-occurring domain. 



15 



W 96/20951 



PCTAJS9S/16982 



An additional strategy for obtaining component DNA-binding domains with properties 
suitable for this invention is to modify an existing DNA-binding domain to reduce its affinity for 
DNA into the appropriate range. For example, a homeodomain such as that derived from the 
human transcription factor Phoxl, may be modified by substitution of flie glutamine residue at 

5 position 50 of the homeodomain. Substitutions at this position remove or change an important 
point of contact between the protein and one or two base pairs of the 6-bp DNA sequence 
recognized by the protein. Thus, such substitutions reduce the free energy of binding and the 
affinity of the interaction with this sequence and may or may not simultaneously increase the 
affinity for other sequences. Such a reduction in affinity is sufficient to effectively eliminate 

10 occupaiuy of the natural target site by this protein when produced at typical levels in 

mammalian cells. But it would allow this draiain to contribute binding energy to and therefore 
cooperate with a second linked DNA-binding domain. Other domains that amenable to this type 
of manipulation include the paired box, the zinc-finger class represented by steroid hormone 
receptors, the myb domain, and the ets domain. 



15 



3. Design of linker sequence for covalently linked composite DBDs. The continuous 
polypeptide span of tfie composite DNA-binding domain may contain the component 
p^^lypeptide modules linked direcdy end-to^d or linked indirectly via an intervening amino 
add or peptide liiOcer. A linker moiety may be designed or selected empirically to permit the 

20 independent interaction of each component DNA-binding domain with DNA without steric 
interference. A linker may also be selected or designed so as to impose specific spacing and 
orientation on the DNA-binding domaii\s. The linker amino adds may be derived from 
endogenous flanking peptide sequence of tfie component domains or may comprise one or more 
heterologous amino adds. Linkers may be designed by modeling or identified by experimental 

25 trial. 

The linker may be any amino add sequence that results in linkage of the component 
domains such that tiiey retain the ability to bind their respective nudeotide sequences. In some 
embodiments it is preferable that tiie design involve an arrangement of domains which requires 
the linker to span a relatively short distance, preferably less than about 10 A. However, in 
30 certain embodiments, depending upon ttie selected DNA-binding domains 

the linker may span a distance of up to about 50 A. For instance, the ZFHDl protein contains a 
glydne-glydne-arginine-arginine linker which joins the carboxyl-tenninal region of zinc finger 2 
to the amino-termirul region of the Oct-1 homeodomain. 



16 



WO9M0951 



PCTA7S9S/16982 



Within the linker/ the amin add sequence may be varied based on the preferred 
characteristics f the linker as determined empirically or as revealed by modeling. F r instance, 
in addition to a desired lengttv modeling studies may show that side groups of certain 
nucleotides or amino adds may interfere widi binding of the proton. Hie primary criterion is 

5 that the linker join the DNA*binding domains in such a numner that &ey retain their ability to 
bind their respective DNA sequences, and titius a linker which interferes with this ability is 
tmdesirable. A desirable linker shotdd also be able to constrain the relative three-dimensional 
positioning of the domains so that only certain composite sites are recognized by tfie diimeric 
protein. Other considerations in choosing the linker indude flexibility of the linker, charge of 

10 the linker and selected binding domains, and presence of seme amino adds of the linker in the 
luiturally-occuning domains. The linker can also be designed such that residues in die linker 
contact DNA, thereby influencing binding affinity or spedfidty, or to interact with other 
proteins. For example, a linker may cmtain an amino add sequence which can be recognized by a 
protease so that the activity of die chimeric protein could be regulated by deavage. In some 

15 cases, particulariy when it is necessary to span a longer distance between the two DNA-binding 
domains or when the domaiiu must be held in a particular configuration, the linker may 
optionally contain an additional folded domain. 

4. Additional domains. Additional domains may be induded in the various chimeric ' 
20 proteins of this invention, e.g. A nudear localization sequence, a transcription regulatory domain, 
a ligand binding domain, a protein-binding domain, a domain capable of cleaving a nudeic add, 
etc. 

For example, in some embodiments the chimeric proteins will contain a ceUular targeting 
sequence which provides for the protein to be translocated to Ihe nudeus. Typically a nuclear 
25 localization sequence has a plurality of basic amino adds, referred to as a bipartite basic repeat 
(reviewed in Garda-Bustos et a!, Biochimica et Biophysica Acta (1991) 1071, 83-101). This 
sequence can appear in any portion of the molecule internal or proxixzud to die N- or C-terminxis 
and results in the diimeric protein being localized inside the nudeus. 

The chimeric proteins may indude domains that facilitate their purification, e.^. 
30 'Tdstidine tags" or a glutathione-S-transferase domain. They may indude ''epitope tags" 

encoding peptides recognized by known monodorud antibodies for the detection of proteins v«thin 
cells or the capture of proteins by antibodies fn vitro. 

A chimeric DNA-binding protein which contains a domain with endonudeasc activity 
(a deavage domain) can also be used as a novel sequence-specific restriction endonudeases to 



17 



wo 96/20951 



PCr/US9S/16982 



deave DNA adjacent to the recognition sequence bound by the chimexic protein. For example, 
sudi a chimeric protein may containing a composite DNA-binding regio and the C*terminal 
cleavage domain f Fok I endonudease, whidi has nonspecific DNA-cleavage activity (li et aL, 
Proc Nati Acad. Set. USA 82:4275-4279 (1992)). 
5 Site-spedfic restriction enzymes can also be linked to other DNA-binding domains to 

generate endonudeases with very strict sequence requirements. The chimeric DNA-binding 
proteins can also be hised to odier domains that can control the stability, assodation and 
subcellular localization of the new proteins. 

The chimeric protein may also indude one or more trariscriptional activation domains, 
10 such as the well-characterized domain from the viral protein VF16 or novd activation domains 
of different designs. For instance, one may use one or multiple copies of transcriptional activating 
motifs from human proteins, induding e.g, the 18 amino add (NFLQLFQQTQGALLTSQP) 
glutamine rich regicm of Oct-2, the N-terminal 72 amino adds of p53, the SYGQQS repeat in 
Ewing sarcoma gene or an 11 amino add (535-545) addic rich region of Rd A protein. Chimeric 
IS proteins which contain both a composite DNA-binding domain and a transcriptional activating 
domain thtxs comprise composite transcription factors capable of actuating transcription of a 
target gene lii\ked to a DNA sequence recognized by the chimeric proteiruThe chimeric proteins 
may include regulatory domains that place the function of the DNA-binding domain under the 
control of an external ligand; one example would be the ligand-binding domain of steroid 
20 receptors. 

The chimeric proteins may also indude a ligand-binding domain to provide for 
regulatablc interaction of the protein with a second polypeptide chairv In such cases, the 
presence of a ligand-binding domain permits assodation of the chimeric DNA-binding protein, in 
the presence of a dimcrizing ligand, with a second diimeric protein containing a trai^criptional 

25 regulatory domain (activator or repressor) and another ligand-binding domain. Upon 

dimerization of the dumeras a composite DNA-binding protein complex is formed which further 
contains the transcriptional regxilatory domain and any other optional domains. 

Multimerizlng ligands useful in practicing this invention are multivalent, i.e., capable of 
binding to, and thus multimcrizing, two or more of the chimeric protein molecules. The 

30 multimerizing ligand may bind to the chimeras contaiiung such Ugand-binding domains, in 
dther order or simultaneously, preferably with a Kd value below about 10^, more preferably 
bdow about 10"^ even more preferably below about 10^, and in some embodiments below about KT 
9 M. The ligand preferably is not a protein or polypeptide and has a molecular weight of less 
than about 5 kDa, preferably bdow 2 kDa. The ligand-binding domains of the chimeric proteins 



18 



wo 96/20951 PCTA7S9S/16982 
so multimezized may be the same r different ligand binding d mains include among others, 
various immimophilin domaiiu. One example is the FKBP domain which is capable of binding to 
dimerizing ligands incorporating PK506 moieties or other FKBP-binding moieties. See e.g. 
PCT/US93/01617, the full contents of which are hereby incorporated by reference. 
5 Illustrating the class of chimeric proteins of this invention which contain a composite 

DNA-binding domain comprising at least one homeodomain and at least one zinc finger domain 
are a set of chimeric proteins in which the composite DMA-binding region comprises an Oct-1 
homeodomain and ziric fingers 1 and 2 of 2f268, referned to herein as ^ZFHDl''. Proteins 
comprising tfie ZFHDl composite DNA-binding region have been produced and shown to bind a 

10 composite DNA sequence (SEQ ID NO.: 17) which includes the nucleic add sequences boimd by 
the relevant portion of the two component DNA-binding proteins. 

Illustrating the class of chimeric DNA-binding proteiiis of this invention which further 
contain at least one transcription activation domain are chimeric proteins containing the ZFHDl 
composite DNA-binding region and the Herpes Simplex Virus VP16 activation domain^ which 

15 has been produced and shown to activate transcription selectively in vtdo of a gene (the 

ludferase gene) linked to an iterated ZFHDl binding site. Another chimeric protein containing 
ZFHDl and a NF-kB p65 activation domain has also been produced and shown to activate 
transcription in vivo of a gene (secreted alkaline phosphatase) linked to iterated ZFHDl 
binding sites. 

20 Transcription factors can be tested for activity in vivo using a simple assay (F.M. Ausubel 

et fl/., Eds,, CURRENT Protocols in Molecular Biology Qohn Wiley & Sons, New York, 1994); de 
Wet et aL, Moi Cell BioL 7:725 (1987)). The in xrivo assay requires a plasmid containing and 
capable of directing the expression of a recombirumt DNA sequence encoding the transcription 
factor. The assay also requires a plasmid containing a reporter gene , e.g., the ludferase gene, the 

25 chloramphenicol acetyl trai\sferase (CAT) gene, secreted alkaline phosphatase or the hiunan 
growth hormone (hGH) gene, linked to a binding site for the transcription factor. The two 
piasmids are introduced into host cells which normally do not produce interfering levels of the 
reporter gene product. A second group of cells, which also lack both the gene encoding the 
transcription factor and the reporter gene^ serves as the control group and receives a plasmid 

30 containing the gene encoding the transcription factor and a plasmid containing the test gene 
without the tnnding site for the transcription factor. 

The production of mRNA or protein encoded by tiie rep orter gene is measured. An increase 
in reporter gene expression not seen in the controls indicates that the transcription factor is a 



19 



WOM/20951 



PCTAJS95/16982 



positive regulator of transcription. If reporter gene expression is less than that of the control the 
transcription factor is a negative regulator f transcription. 

Optionally, th assay may iitdude a transfection efficiency control plasxnid. This 
plasmid expresses a gene product independent of the test gene, and the amount of this gene 
5 product indicates rou^y how many cells are taking up the plasmids and how efficientiy tfie 
DNA is being introduced into the cells. Additional guidance on evaluating chimeric proteins of 
this invention is provided below. 

5\ Design and assembly of constructs. DNA sequences encoding indhdduai DNA-binding 

10 sub-domains and linkers, if any, are joined such that they constitute a single open reading frame 
encoding a chimeric protein containing the composite DNA-binding region and capable of being 
translated in cells or cell lysates into a single polypeptide harboring all component domains* 
This protein-encoding DNA sequence is then placed into a conventional plasmid vector that 
directs the expression of the protein in the appropriate cell type. For testing of proteins and 

15 determination of binding specificity and affinity, it may be desirable to construct plasmids that 
direct the expression of the protein in bacteria or in reticulocyte-lysate systems. For use in the 
production of proteins in mammalian cells, the protein-encoding sequence is introduced into an 
expression vector that directs expression in these cells. Expression vectors suitable for such xises 
are well known in the art. Various sorts of such vectors are commercially available. 

20 In embodiments involving composite DNA-binding proteins or accessory chimeric proteins 

which contain multiple domains, e,g, proteins containing a ligand binding domain and/or a 
transcription regulatory domain, DNA sequmces encoding the constituent domains, with any 
introduced sequence alterations may be ligated or otherwise joined together such that they 
cox\stitute a single open reading frame that can be translated in cells into a single polypeptide 

25 harboring all constituent domains. The order and arrangement of the domains within the 
polypeptide can vary as desired. 

6. Target DNA sequence. The DNA sequences rccogruzed by a chimeric protein containing 
a composite DNA-binding domain can be determined experimentally, as described below, or the 
30 proteins can be manipulated to direct their spedfidty toward a desired sequence. A desirable 
nucleic add recognition sequence consists of a nudeotide sequence sparming at least ten, preferably 
eleven, and more preferably twelve or more bases. The component binding portions (putative or 
demonstrated) within the nucleotide sequence need not be fully contiguous; they may be 
interspersed with ''spacer" base pairs that need not be directly contacted by the chimeric protein 



20 



wo 9600951 



FCTAJS95/16982 



but rather impose proper spadng between tiie nudeic add subsites recognized by eadi module. 
These sequences should not impart expression to linked genes when introduced into ceils in the 
absence of the engineered DNA-binding protein. 

To identiiy a nucleotide sequence that is recognized by a chimeric protein containing die 

5 composite DNA-binding region, preferably recognized with high affinity (dissociation constant 
lO*^^ M or lower are especially preferred), several methods can be used. If hi^*a£finity binding 
sites for individual subdomains of the composite DNA-binding region are already known, then 
diese sequences can be joined with various spacing and orientation and the optimum configuration 
determined experimentally (see bdow for methods for determining affinities). Alternatively, 

10 high*aHinity binding sitfes for the protein or protein complex can be selected from a large pool of 
random DNA sequences by adaptation of published methods (Pollock, R. and Treismaa R*, 1990, 
A sensitive method for the determination of protein-DNA binding spedfidties. NucL Acids Res. 
18, 6197-6204). Bound sequences are doned into a plasmid and their precise sequence and affinity 
for the proteins are determined. From this collection of sequences, individual sequences with 

15 desirable characteristics (f.f., maximal affinity for composite protein, minimal affinity for 
individual subdomains) are selected for use. Alternatively, the collection of sequences is used to 
derive a consensus sequence that carries the favored base pairs at eadi position. Such a consensus 
sequertce is synthesized and tested (see bdow) to confirm that it has an appropriate level of 
affinity and specifidty. 

20 

7. Design of target gene constr u ct A DNA construct that enables the target gene to be 
regulated, cleaved, etc. by DNA-binding proteins of this invention is a fragment plasmid, or 
other nudeic add vector carrying a synthetic transcription unit typically consisting of: (1) one 
copy or multiple copies of a DNA sequence recognized with hig^-affinity by the composite 

25 DNA-binding protein; (2) a promoter sequence cor\5isting minimally of a TATA box and initiator 
sequence but optionally iiKluding other transcription factor binding sites; (3) sequence encoding 
die desired product (protein or RNA), induding sequences that promote the initiation and 
termination of translation, if appropriate; (4) an optiorud sequence consisting of a splice donor, 
splice acceptor, and intervening intnm DNA; and (5) a sequence directing dcavage and 

30 polyadenylation of the resulting RNA transcript 

S. Determination of binding affinity. A number of well-characterized assays are 
available for determining the binding affinity, usually expressed as dissociation constant, for 
DNA-binding proteins and the cogruite DNA sequences to which they bind. These assays usually 



21 



wo 96^0951 



PCrA}S95/1698Z 



require the preparation of purified protein and binding site (usuaUy a synthetic ligonudeotide) 
of kn wn concentration and specific activity. Examples include electroph retic xn bilxty*shif t 
assayS/ DNasd protection or "footprinting", and filter-binding. These assays can als be used to 
get rough estimates of association and dissociation rate constants. These values may be 

5 detennined with greater precision using a BIAcore instrument. In this assay, the synthetic 
oligonucleotide is bound to the assay ''chip/ and purified DNA-binding protein is passed 
through the flow-celL Binding of die protein to the DNA immobilized on die chip is measured 
as an increase in refractive index. Once protein is bound at eqiidlibrium, buffer without protein is 
passed over the chip, and the dissociation of the protein residts in a return of the refractive index 

10 to baseline value. The rates of association and dissociation are calculated from these curves, and 
the affinity or dissociation constant is calculated from these rates. Binding rates and affinities 
for the high affinity composite site may be compared with the values obtained for subsites 
recognized by each subdomain of the proteiiu As noted above, the difference in these dissociation 
constants should be at least two orders of magnitude and preferably three or greater. 

15 

9. Testing for function m vivo. Several tests of increasing stringency may be used to 
confirm the satisfactory performance of a DNA-binding protein designed according to this 
invention. All share essentially die same components: (1) (a) an expression plasmid directing the 
production of a chimeric protein comprising the composite DNA-binding region and a potezit 
20 transcriptional activation domain or (b) one or more expression plasmids directing the production 
of a pair of chimeric proteins of this invention which are capable of dimerizing in the presence of 
a corresponding dimerizing agent, and thus forming a protein complex containing a composite 
DNA-binding region on one protein and a trsmscription activation domain on the odier; and (2) a 
reporter plasmid directing the expression of a reporter gene, preferably identical in design to the 
25 target gene described above midtiple binding sites for the DNA-binding domain, a minimal 
promoter element, and a gene body) but encoding any conveniendy measured protein. 

In a transient transfection assay, the alwve-mentioned plasmids are introduced together 
into tissue culture cells by any conventional transfection procedure, including for example caldimi 
phosphate copredpitation, electroporation, and lipofection. After an appropriate time period, 
30 usually 24-48 hr, the cells are harvested and assayed for production of die reporter protein. In 
embodiments requiring dimerization of chimeric proteins for activation of transcription, the 
assay is conducted in die presence of die dimerizing agent. In an appropriately designed system, 
die reporter gene should exhibit Utde activity above background in die absence of any co- 
transfected plasmid for die composite transcription factor (or in die absence of dimerizing agent 



22 



wo 96^0951 



PCT/US9S/16982 



in embodiments tmder dimeiizer control). In contrast, zeporter gene expression should be elevated 
in a dose-dependent fashion by the inclusion of the plasmid eruxxiing die composite transcrq>tion 
factor (or plasxnids encoding the multimerizable dumeras, f U wing addition of multimerizing 
agent). This result indicates dtat there are few natural transcription factors in the recipient cell 

5 with the potential to recognize the tested binding site and activate transcription and that the 
engineered DNA-binding domain is capable of binding to this site inside living cells. 

The transient transfection assay is not an extremely s tringent test in most cases, becaiise 
the high concentrations of plasmid DNA in die transfected cells lead to unusually high 
c on centrations of die DNA*binding protein and its recognition site, allowing functional 

10 recognition even with relative low afBnity interactions. A more stringent test of the system is a 
transfection diat results in the integration of the introduced DMAs at near single-copy. Thus, 
bodi the protein conce n tration and die ratio of specific to non-specific DNA sites would be very 
low; only very high affinity interactions would be expected to be productive. This scenario is 
most readily achieved by stable transfection in which the plasmids are transfected together 

15 with another DNA encoding an unrelated selectable marker (e,g., G418-resistance). Transfected 
cell clones selected for drug resistance typically contain copy ntmibers of die nonselected plasmids 
ranging from zero to a few dozen. A set of clones covering diat range of copy numbers can be used to 
obtain a reasonably dear estimate of the effidency of the system. 

Perhaps the most stringent test involves the use of a viral vector, typically a retrovirus, 

20 that incorporates both the reporter gene and the gene encoding die composite transcription factor 
or multimerizable components thereof. Virus stocks derived from such a construction will 
generaUy lead to single-copy transduction of die genes. 

If the ultimate application is gene therapy, it may be preferred to construct transgenic 
animals carrying similar DNAs to determine whether the protein is functional in an animal. 

25 

11. Introduction of Constructs into Cells 

Constructs encoding the chimeras containing a composite DNA-binding region, constructs 
encoding related chimeric proteins (eg. in the case of ligand-dependent appldations) and 
constructs directing the expression of target genes, all as described herein, can be introduced into 
30 celbasoneormoreDNAmoleculesorconstructs,inmany cases in association with one or more 
markers to allow for selection of host cells whidi contain the construct(s). Tlie constructs can be 
prepared in conventional ways, where the coding sequences and regulatory regions may be 
isolated, as appropriate, ligated, doned in an appropriate doning host, analyzed by restriction 
or sequencing, or other convenient means. Particularly, using PCR, individual fragments including 



23 



wo 96^0951 



PCT/DS95/16982 



all r portions of a functional unit may be is Uted, where one or more mutations may be 
introduced using "primer repair^ Ugation, m tnfro mutagenesis, etc. as appropriate. Hie 
construct(s) once completed and demonstrated t have the appropriate sequences may then be 
introduced into a host cell by any convenient means. The constructs may be integrated and 
5 packaged into non-repUcating, defective viral genomes like Adenovirus, Adeno-assodated vims 
(AAV), or Herpes simplex virus (HSV) or others, including retroviral vectors, for infection or 
transduction into cells. The constructs may include viral sequences for transfectioa if desired. 
Alternatively, the construct may be introduced by fusion, electroporatiorv biolistics, transfection, 
lipof ection, or the like. The host cells will in some cases be grown and expanded in culture before 
10 introduction of the constmct(s), foUowed by the appropriate treatment for introduction of ttie 
construct(s) and integration of the construct(s). The ceUs will then be expanded and screened by 
virtue of a marker present in the constmct Various markers which may be used successfuUy 
include hprt, neomycin resistance, thymidine kiruise, hygromydn resistance, etc 

In some instances, one may have a target site for homologous recombination, where it is 
15 desired that a constmct be integrated at a particular locus. For example, one can delete and/ or 
replace an endogenous gene (at the same locus or elsewhere) with a recombinant target constmct 
of this invention. For homologous recombination, one may generaUy use either fl or O-vectors. 
See, for example, Thomas and Capecchi, Cell (1987) 51, 503-512; Mansour, et aL, Nature (1988) 
336, 348-352; and Joyner, et aL, Nature (1989) 338, 153-156. 
20 The constructs may be introduced as a single DNA molecule encoding all of the genes, or 

diEferent DNA molecules having one or more genes, The constructs may be introduced 
simultaneously or consecutively, each with the same or different markers. 

Vectors containing useful elements such as bacterial or yeast origins of replication, 
selectable and/or amplifiable markers, promoter/enhancer elements for expression in 
25 procaryotes or eucaryotes, etc, which may be used to prepare stocks of construct DNAs and for 
carrying out transfections are wcU known in the art, and many are commerdaUy avaUable. 

12. Introduction of Ccmslructs into Animals 

CeUs which have been modified ex vivo with the DNA constructs may be grown in 
culture under selective conditions and cells which are selected as having the desired construct(s) 
may then be expanded and hirther analyzed, using, for example, the polymerase chain reaction 
for determining the presence of the construct in the host cells. Once modified host cells have been 
identified, they may then be used as planned, e.g. grown in culture or introduced into a host 
organism. 



30 



24 



W 9^0951 PCrA7S9Sa69R2 



Depending upon the natuxe of the cells, Ote ceils may be introduced into a host organisnv 
e,g. a mammal, in a wide variety of ways. Hematopoietic cells may be administered by ir^ection 
into the vascular system, there being usually at least about 10* cells and generally not m re than 
about 10^*^, more usually not more tfian about 10* cells. The number of cells which are employed 

5 will depend upon a number of circumstances, the purpose for the introduction, ttie lifetime of the 
cells, the protocol to be used, for example, die ntanber of administrations, the ability of the cells 
to multiply, the stability of the therapeutic agent, the physiologic need for the therapeutic 
agent, and the like. Alternatively, with skin cells which may be used as a graft, the number of 
cells woxild depend upon the size of the layer to be applied to the btun or other lesion. Generally, 

10 for myoblasts or fibroblasts, the number of cells will be at least about 10* and not more than about 
108 and may be applied as a dispersion, generally being injected at or near the site of interest 
The cells will usually be in a physiologically-acceptable medium. 

Cells engineered in accordance with this invention may also be encapsulated, e.g. using 
conventional materials and methods. See e,g, Uludag and Sefton, 1993, / Biotned, Mater, Res. 

15 27(10):1213-24; Chang et 1993, Hum Gene Ther 4(4):433-40; Reddy et at, 1993, / Infect Dis 
168(4):1082-3; Tai and Sun, 1993, FASEB } 7(ll):1061-9; Emerich et al, 1993, Exp Neurol 122(1):37- 
47; Sa^en et al, 1993, / Neurosci 13(6):2415-23; Aebischer et al, 1994, Exp Neurol 126{2):151-8; 
Savelkoul et al, 1994, / Immunol Methods 170(2):185-96; Winn et al, 1994, PNAS USA 91(6):2324- 
8; Emerich et al, 1994, Prog Neuropsychopharmacol Biol Psychiatry 18(5):935-46 and Kordower 

20 et al 1994, PNAS USA 91(23):10898-902, The cells may then be introduced in encapsulated form 
into an axumal host, preferably a mammal and more preferably a human subject in need thereof. 
Preferably the encapsulating material is semipermeable, permitting release into the host of 
secreted proteins produced by the encapsulated cells. In many embodiments the semipermeable 
encapsulation renders the encapsulated cells immunologically isolated from the host organism in 

25 which the encapsulated cells are introduced. In those embodiments the cells to be encapsulated 
may express one or more diimeric proteins containing components domains derived from viral 
proteins or proteins from other species. 

Instead of ex xnvo modification of the cells, in many situations one may wish to modify 
cells in viw. For tiiis piupose, various techniques have been developed for modification of target 

30 tissue and cells m rfcw. A number of virus vectors have been developed, sudi as adenovirus, 

adeno-assodated virus, and retroviruses, which allow for transfection and random integration of 
the virus into the host See, for example. Debunks et al (1984) Proc. Natl Acad. Sci, USA 81, 
7529-7533; Caned et al, (1989) Science 243,375-378; Hiebert et al (1989) Proc. Natl Acad. Sci. 
USA 86, 3594-3598; Hatzoglu et al (1990) J, Biol Chem. 265, 17285-17293 and Ferry, et al (1991) 



25 



W W20951 PCr/DS!«16982 

Proc, Natl Acad. ScL USA 88, 8377-8381. The vector may be administered by injection, e.g. 
intravascularly or intramusculariy, inhalation, or other parenteral mode. 

In accordance with in vivo genetic modificati n, tiie manner of the modification will 
depend on the nature of the tissue, the efficiency of cellular modification required, the mmiber of 
5 opportunities to modify the particular cells, the acccssibiUty of the tissue to the DNA 

composition to be introduced, and the like. By employing an attenuated or modified retrovirus 
carrying a target transcriptional initiation region, if desired, one can activate the virus using one 
of the subject transcription factor constructs, so that the virus may be produced and transf ect 
adjacent cells. 

10 The DNA introduction need not result in integration in every case. In some situations, 

transient maintenance of the DNA introduced may be sufficient In tfus way, one could have a 
short term effect, where cells could be introduced into the host and then turned on after a 
predetermined time, for example, after the cells have been able to home to a particular site. 

15 13. ZFHDl 

Illustrating one design approach. Example 1 describes computer modeling studies which 
were used to determine the orientation and lirOcage of potentially useful DNA-binding domains 
(see Example 1). Computer modeling studies allowed manipvdation and superimposition of the 
crystal structures of Zif268 and Oct-1 protein-DNA complexes. This study yielded two 
20 arrangements of the domains which appeared to be suitable for use in a chimeric protein. In one 
alignment, the carboxyl-terminal region of zinc finger 2 was 8.8 A away from the amino-terminal 
region of the homeodomain, suggesting that a short polypeptide could connect these domains. In 
this model, the chimeric protein would bind a hybrid DNA site with the sequence 5'- 
AAATNNTGGGCG-3* (SEQ ID NO.: 18). The Oct-1 homeodomain would recognize the AAAT 
25 subsite, zinc finger 2 would recognize the TGG subsite, and zinc finger 1 would recognize the GCG 
subsite. No risk of steric interference between the domains was apparent in this model. This 
arrangement was used in the work described below and in the Examples. 

The second plausible arrangement would also have a short polypeptide linker spanning 
the distance from zinc finger 2 to the homeodomain Oess than 10 A); however, the subsites are 
30 arranged so that the predicted binding sequence is 5'^GCCCANNAAAT-3' (SEQ ID NO.: 19). 
This arrangement was not expUdtly used in the work described below, although the flexibility 
of the linker region may also allow 2PHD1 to recognize this site. 

After selecting a suitable arrangement, construction of the corresponding molecule was 
carried out. Generally, sequences may be added to the chimeric protein to facilitate expression. 



26 



wo 96100951 



PCT/nS9S/16982 



detection, ptxrification or assays of the product by standard methods. A glutathione S-transferase 
domain (GST) was attached to ZFHDl for these purpose (see Example 2). 

The consensus binding sequence of die chimeric protein ZFHDl was determined by 
selective binding studies from a random pool of oligonucleotides. The oligonucleotide sequences 

5 botmd by die diimeric protein were sequenced arul compared to determine the consensus bin^ 
sequence for the chimeric protein (see Example 3 and Figure 1). 

After four rounds of selection, 16 sites were doned and sequenced (SEQ ID NOS.: 1-16, 
Figure IB). Comparing these sequences revealed the consensus binding site y-TAATTANGGGNG- 
3' (SEQ ID NO.: 17). The 5* half of this consensus, TAATTA, resembled a canonical homeodomain 

10 binding site TAATNN (Laug^on, (1991)), and matched the site (TAATNA) that is preferred by 
the Oct-1 homeodomain in the absence of the POU-spedfic domain (Verrijzer et al, EMBO 
11:4993 (1992)). The 3' half of the consensus, NGGGNG, was consistent with adjacent binding sites 
for fingers 2 (TGG) and 1 (GCG) of Zif268. 

Binding studies were performed in order to determine the ability of the chimeric protein 

15 2IFHD1 to distinguish the consensus sequence from the sequences recognized by the component 
polypeptides of the composite DNA-binding region. ZFHDl, the Oct-1 POU domain (contaiiung 
a homeodomain and a POU-spedfic domain), and the three zinc fingers of Zif268 were compared 
for their abiHties to distinguish among the Oct-1 site 5 -ATGCAAATGA-3' (SEQ ID NO.: 20), 
tfie Zif268 site 5'-GCGTGGGCG-3* and tfie hybrid binding site 5'-TAATG ATGGGCG-3' (SEQ ID 

20 NO.: 21). The diimeric protein 21FHD1 preferred the optimal hybrid site to the octamer site by a 
factor of 240 and did not bind to the Zif site. The POU domain of Oct-1 bound to the octamer site 
with a dissodaticm constant of 1.8 x 10"^® M under the assay conditions used, preferring this site to 
the hybrid sequences by factors of 10 and 30, and did not bind to the Zif site. The three zinc 
fingers of Zif268 bound to the ZjI site with a dissociation constant of 33 x 10"^^ and did not 

25 bind to the other three sites. These experiments show that ZFHDl binds tightly and spedfically 
to the hybrid site and displayed DNA-binding spedHdty that was clearly distinct from that of 
either of the original proteins. 

In order to determine whether the novel DMA-binding protein could function in vivo, 
ZFHDl was fused to a transcriptional activation domain to generate a transcription factor, and 

30 transfection experiments were performed (see Example 5). An expression plasmid encoding 

ZFHDl fused to the carboxyl-terminal 81 amino adds of the Herpes Simplex Virus VP16 protein 
(ZFHD1-VP16) was co-transfected into 293 cells with reporter constructs containing the SV40 
promoter and the firefly ludferase gene (Figure 3). To determine whether the chimeric protein 
co\ild spedfically regulate gene expression, reporter cortstructs containing two tandem copies of 



27 



wo 96^0951 



PCTAJS9S/16982 



either the ZFHDl site 5'-TAATGATGGGCG-3' (SEQ ID NO.: 21), the octaxncr site 5'- 
ATGCAAATGA-3* (SEQ ID NO.: 20) or the 22f site 5 -GCGTGGGCG-J inserted upstream of the 
SV40 promoter were tested. When th reporter contained two copies of the ZFHDl site, the 
ZFHD1-VP16 protein stimulated the activity of the promoter in a dose-dependent manner. 

5 Furthermore, the stimulatory activity was specific for the promoter containing the ZFHDl 
binding sites. At levels of protein which stimulated this promoter by 44-fold, no stimulation 
above backgroimd was observed for promoters containing the octamer or Zif sites. Thus, ZFHDl 
efficiently and specifically recognized its target site in vivo. 

Utilizing the above-described procedvires and known DNA-binding domains, other novel 

10 chimeric transcription factor proteiits can be constructed. These chimeric proteins can be studied 
as disclosed herein to detenxiixte the consensus binding sequence of the chimeric protein. The 
binding specificity, as well as the in xrivo activity, of the chimeric protein can also be 
determined using the procedures illustrated herein. Thus, the methods of this invention can be 
utilized to create various chimeric proteins from the domains of DNA-binding proteins. 

15 

14. Optimization and Engineering of composite DNA-binding regions 
The useful range of composite DNA binding regions is not limited to the spedfities that 
can be obtained by linking two naturally occurring DNA binding subdomains. A variety of 
mutagenesis methods can be used to alter the binding spcdfidty. These irurlude use of the crystal 
20 or NMR structures (3D) of complexes of a DNA-binding domain (DBD) with DNA to rationally 
predict (an) amino add substitution(s) that will alter the nudeotide sequence spedfidty of DNA 
binding, in combination with computatiorul modelling approaches. Candidate mutants can then 
t>e engineered and expressed and their DNA binding spedfidty identified using oligonucleotide 
site selection and DNA sequencing, as described earlier. 
25 An alternative approach to generating novel sequence specificities is to use databases of 

known homologs of the DBD to predict amino add substitutions that will alter binding. For 
example, analysis of databases of ziiK finger sequences has l>een used to alter the binding 
spedfidty of a zinc finger (Desjarlais and Berg (1993) Proc. Natl Acad. Set. USA 90, 2256-2260). 
A further and powerful approach is random mutaganesis of amino add residues which 
30 may contact the DNA, followed by screening or selection for the desired novel spedfidty. 
Preferably, the libraries are surveyed using phage display so that mutants can be directly 
selected. For example, phage display of the three fingers of Zi£268 (induding the two 
incorporated into ZFHDl) has been described, and random mutagenesis and selection has been 
used to alter the specificity and affinity of the fingers (Rebar and Pabo (1994) Science 263, 671- 



28 



wo 96/20951 



PCTAJS95/16982 



673; Jamieson et a!, (1994) Biochemistry 33, 5689-5695; Cho and Mug (1994)Proc NatL Acad, 
Scu USA 91, 11163-11167; Choo and Mug (1994)Proc. Natl Aca± Sci. USA 91, 11168-11172; Choo 
et al (1994) Nature 372, 642-645; Wu et al (1995) Proc NatL Acad. Sd USA 92, 344-348). These 
mutants can be incorporated into ZFHDl to provide new composite DNA binding regions with 

5 novel nucleotide sequence specificities. Other DBDs may be similarly altered. If structural 
information is not available, general mutagenesis strategies can be used to scan die entire domain 
for desirable mutations: for example alazune-scanxung mutagenesis (Cunzungham and Wells 
(1989) Science 244, 1081-1085), PGR misincorporation mutagenesis (see eg. Cadwell and Joyce 
(1992) PGR Meth. AppUc 2, 28-33), and "DNA shuffling' (Stemmcr (1994) Nature 370, 389-391). 

10 Ihese techniques produce libraries of random mutants, or sets of single mutants, that can then be 
readily searched by screening or selection approaches sudi as phage display. 

In all these approadies, mutagenesb can be carried out directly on the composite DNA 
binding region, or on the individual subdomain of interest in its natural or other protein context 
In ttie latter case, the engineered component domain with new nucleotide sequence specificity 

15 may be subsequently incorporated into the composite DNA binding region in place of the starting 
component Ihe new DNA binding spedfidty may be wholly or partially different from that of 
the initial protein: for example, if the desired binding spedfidty contains (a) subsite(s) for 
known DNA binding subdomains, other subdomains can be mutated to recognize adjacent sequences 
and then combined with the natural domain to yield a composite DNA binding region with the 

20 desired specifidty. 

Randomization and selection strategies may be used to incorporate other desirable 
properties into the composite DNA binding regions in addition to altered nudeotide recognition 
specifidty, by imposing an appropriate m vitro selective pressure (for review see Qackson and 
Wells (1994) Trends Biotech. 12, 173-184). These indude improved affinity, improved stability 

25 and improved resistance to proteolytic degradation. 

The ability to engineer binding regions with novel DNA binding spedfidties permits 
composite DNA binding regions to be designed and produced to interact specifically with any 
desired nudeotide sequence. Thus a clinically interesting sequence may be daosen and a composite 
DNA binding region engineered to recognize it For example, composite DNA binding region may 

30 be designed to bind diromosomal breakpoints and repress transcription of an otherwise activated 
oncogene (see Choo et al (1994) Nature 372, 642-645); to bind viral DNA or RNA genomes and 
block or activate expression of key viral genes; or to specifically bind the common mutated 
versions of a mutational hotspot sequence in an oncogene and repress transcription (such as the 



29 



W 96/20951 



PCTAJS9Sn6982 



mutation f codon 21 of human ras), and analogously to bind mutated tumor supressor genes and 
activate their transcription. 

Additionally, in optimizing chimeric proteins f this invention it should be appreciated 
that hrununogenidty of a polypeptide sequence is tfiought to require ttie binding of peptides by 

S proteixis and the recognition ofthe presented peptides as foreign by endogenous TkkU 

receptors. It may be preferable, at least in gene therapy applications, to alter a given foreign 
peptide sequence to minimize the probability of its being presented in hxunans. For example, 
peptide binding to human MHC class I molecules has strict requirements for cerain residues at key 
•anchor* positions in the bound peptide eg. HLA-A2 requires leucine, methionine or isoleucine at 
10 position 2 and leucine or valine at ttie C-terminus (for review see Stem and Wiley (1994) 
Structure 2, 145-251). Thus in engineered proteins, this periodicity of these residues could be 

' avoided. 

15. Tissue-specific or CcU^type Specific Expression 
15 It may be preferred in certain embodiments that tfie chimeric protein(s) of this invention 

be expressed in a cell-specific or tissue-specific manner. Such specificity of expression may be 
achieved by operably linking one ore more of the DNA sequences encoding the dumeric protein(s) 
to a cell-type specific transcriptional regulatory sequence {e,g, promoter/enhancer). Niunerous 
cell-type specific transcriptional regulatory sequences are known which may be used for this 
20 purpose. Others may be obtained from genes which are expressed in a cell-specific manner. 

For example, constructs for expressing the chimeric proteins may contain regulatory 
sequences derived from known genes for specific expression in selected tissues. Representative 
examples are tabulated below: 



30 



WO96a09Sl 



PCT/US9S/16982 





gene 


reference 1 


Vlens 


72-crystallin 


Brcitmaiv M.U Clapoft Rossant, U Tsui, L.C., Golde, L.M., Maxwell, LH., 1 
Bemstin, A. (1987) Genetic Ablation: targeted expression of a toxin gene causes | 
microphthalmia in transgenic mice. Scimcr 238: 1563-1565 




ctA-crystallin 


Landel C Zhao, J., Bok, Evans, GjV. (1988) Lens^pedfic expression of a 
recombinant ridn induces devebpmental defiects in the eyes of transgenic mice. 
Gtnes Dev. 2: 1168-78 

Kaur, S., Key, B., Stock,)., McNeish, J D., Akeson, Potter, SS. (1989) Targeted 
ablation of alpha-crystallin-synthesizing cells produces lens-defident eyes in 
transgenic mice. DmloTrnxent 105: 613-619 


1 pituitary 
^somatr^hic 
1 cells 


Growth honnone 


Behringer, RR., Matthews, LS., Palmiter, R.D., Brinster, R.L. (1988) Dwarf mice 
produced by genetic ablation of growth hormone-expressing cells. Genes Dev, 2: 
453-461 


pancreas 
* 


Insxilin- 

Elastase - acinar 
cell specific 


Omitz, D.M^ Palmiter, R.D^ Hamma, R^, Brinster. R.L, Swift, G.R, 
MacDonald, R.J. (1985) Specific expression of an elastase-human growth fusion 
in parKreatic acinar cells of transgeneic mice. Nature 131: 600-603 

Palmiter, R.D., Behringer, RJ^, Quaife. CJ-, MaxweU, F., Maxwell, I.H., Brinster, 
R.L. (1987) Cell lineage ablation in transgeneic mice by cell-specific expression 
of a toxin gene. Cell 50: 435-443 


T cells 


Ick promoter 


ChaffiiV K.E., Beals, C.R., WOkie, T.M., Forbush, K,A„ Simon, Mi., Perlmutter, 
R.M. (1990) EMBO Journal 9: 3821-3829 


B cells 


bnmunogbbulin 
kappa light chain 


Borelli, E., Heyman, R., Hsi, M., Evans, R^. (1988) Targeting of an inducible 
toxic phenotype in animal cells. Proc, Nati Acad, Sd USA 85: 7572-7576 

Heyman, KA., Borrelli, E, Lesley, J., Anderson, D., Richmond, D.D., Baird, S.M., 
Hyrrum, R., Evaiu, R.M. (1989) Thymidine kinase obliteration: creation of 
transgenic mice with controlled immuiKxieficiendes. Proc NatL Acad, Sci, USA 
86: 2698-2702 


Schwann 
cells 


Pq promoter 


Messing, A-, B^iringer, RJt, Hammang, JP. Palmiter, RD, Brinster, RU Lemke, G. 
,P0 promoter directs espression of lepoiter and toxin genes to Schwann cells of 
transgenic mice, hieuron 8: 507-520 1992 




Myelin basic 
protein 


Miskimins, R. Krupp, L., £)ewey,M], Zhang, X Cell and tissue-specific 
expression of a heterologous gene under control of the myelin basic protein gene 
promoter in trangenic mice. Brain Res Dev Brain Res 1992 Vol 65: 217-21 



31 



wo 96/20951 



PCTAJS95/1698Z 



spennatids 


protamine 


(1990) Genetic ablation in transgenic mice with attenuated diphtheria toxin A 
gene. Mol Cell Biol, 10: 474-479 


lung 


Lung sux^icant 
gene 


MacDonald, R.J. (1985) Specific expression of an dastase-htiman growth fusion 
in pancreatic acinar cells of transgeneic mice. Nature 131: 600-603 


1 adipocyte 


rZ 


Rnee C R Rmvov R A CniMrplmsn Tartrrtpd P¥nTP4.cinn nf s fmvtn oono ¥/\ 

adipose tissue: transgenic mice resistant to obesity Genes and Dev 7: 131&-24 
1993 


llinuscte 


myosin light 
chain 


1^ K1 Rrtf» R5? R<v*kfnan HA. Harris AN O'Brien TX van-BiUm M 

Shubeita, HE, Kandolf, R., Brem, G., Prices et al /. BIoL Chan, 1992 Aug 5, 267: 
15875-85 




Alpha actin 


f^n T^iii ■ II C fwkPtfvw %A l^mlckc f TYio hitman cVolo^at 

iviuscarf oc., & erry« 3. , rmince, n. r\xasaf i rie nunuui aKejeuu a^na*acun 
gene is regulated by a musde-spedfic enhancer that binds three nudear factors. 
Gene Expression % 111-26, 1992 


neurons 


ncuro-filamenl 
proteins 


Reebea M. Hahnekyto, M. Alhonea L. Sincnorta, R. Saarma, M. JanneJ. Tissue- 
spedfic expression of rat light neurofilament promoter-driven reporter gene in 
transgenic mice. BBRC 1993: 192: 465-70 


liver 


tyrosine amino- 
transferase, 
albvtmin, apoUpo- 
proteins 







Identification of tissue specific promoters 

To identify the sequences that control the tissue- or cell-type specific expression of a 
5 gene, one isolates a genonuc copy of the selected gene induding sequences •'upstream" from the 

exor\s that code for the protein. 



5'flanking sequences cod'ffig sequences 



10 



These upstream sequences are then usually fused to an casUy detectable reporter gene 
like beta-galactosidase, in order to be able to f oDow the expression of the gene under the control 
of upstream regulatory sequences. 



32 



wo 96/20951 



PCTAJS9S/16982 



Stlanktng sequences reporter gen 

I | = - = .a-=- = . = -s=-»- = -=5-| 

5 To establish which upstream sequences are necessary and sufficient to ccmtrol gene 

expression in a cell-lype specific manner, the complete upstream sequences are introduced into 
the cells of interest to detennine whether the initial clone contains the control sequences. 
Reporter gene expression is monitored as evidence of expression. 



10 



15 
























i=.=.=.=.=-=.=.=.=.=-i 

i=. I 



If these sequences contain the necessary sequences for cell-type specific expression, 
20 deletions (shown schematically above) may be made in the 5' flanking sequences to determine 
which sequences are minimally required for cell-type specific expression. This can be done by 
making transgenic mice with each construct and m<mitoring beta gal expression, or by first 
examining the expression in specific culture cells, with comparison to expression in non-specific 
cultured cells. 

25 Several successive rounds of deletion aiudysis normally pinpoint the minimal sequences 

required for tissue specific expression. Ultimately, these sequences are then introduced into 
transgenic mice to confirm that the expression is only detectable in the cells of interest 



16. Applications 

A. Constitutive gene therapy. Gene therapy often requires controlled high-level 
expression of a therapeutic gene, sometimes in a cell-type specific pattern. By supplying 
saturating amoimts of an activating transcription factor of this invention to the therapeutic gene , 
considerably higher levels of gene expression can be obtained relative to natural promoters or 



33 



W 96^0951 PCT/USW16982 

enhancers, which are dependent on endogenous transchptim factors. Thtis, one application of this 
invention to gene therapy is Ae delivery of a two-transcription-imit cassette (which may reside 
on one or tw plasmid molecules, depending on tfie delivery vector) consisting f(l)a 
transcription imit encoding a dmneric protein composed of a composite DNA-binding region of 
5 this invention and a strong transcription activation domain (e.g,, derived from the VP16 protein, 
p65 protein, etc) and (2) a transcription imit consisting of the therapeutic gene expressed under 
the control of a minimal promoter carrying one, and preferably several, binding sites for the 
composite DNA-binding domairt Cointroduction of the two transcription imits into a cell results 
in the production of the hybrid transcription factor which in turn activates the therapeutic gene 
10 to high level This strategy essentially incorporates an amplification step, because the promoter 
that would be used to produce the therapeutic gene product in convcntioiud gene therapy is used 
instead to produce the activating transcription factor. Each traiiscription factor has the 
potential to direct the production of multiple copies of the therapeutic protein. 

This method may be employed to increase the efficacy of many gene therapy strategies 
15 by substantially elevating the expression of the therapeutic gene, allowing expression to reach 
therapeutically effective levels. Examples of therapeutic genes that would benefit from diis 
strategy are genes that encode secreted therapeutic proteiiw, such as cytokines, growth factors 
and other protein hormones, antibodies, and soluble receptors. Other candidate therapeutic 
genes are disclosed in PCr/US93/01617. 

20 

B. Regulated gene therapy. In many instances, the ability to switch a therapeutic gene on 
and off at will or the ability to titrate expression with precision arc essential to therapeutic 
efficacy. This invention is particularly well suited for achieving regulated expression of a target 
gene. Two examples of how regulated expression may be achieved are described. The first 
25 involves a recombinant transcription factor which comprises a composite DNA-binding domain, a 
potent transcriptional activation domain, and a regulatory domain controllable by a small 
orally-available ligand. One example is the ligand-binding domain of steroid receptors, in 
particular the domain derived from the modified progesterone receptor described by Wang et al 
1994, Proc Nad Acad Sd USA 91:8180-8184. In this example, the composite DNA binding domain 
30 of this invention is used in place of the GAM domain in the recombinant transcription factor and 
the target gene is linked to a DNA sequence recognized by the composite DNA binding domain. 
Sudi a design permits the regulation of a target gene by known anti-progestins such as RU486. 
The transcription factors described here greatly enhance the efficacy of this regulatory domain 
because of the enhanced affinity of the DNA-binding domain and the absence of background 



34 



wo 96^0951 



PCTA)S95/16982 



activity that arises from ligand-independent dimerization directed by the GAL4 domain in 
published constructs. 

An ther example involves a pair of chimeric proteins, a dimerizing agent capable of 
dimerizing the diimeras and a target gene construct to be expressed. Hie first chimeric protein 

5 comprises a composite DNA-4>indiiig region as described herein arul one or more copies of one or 
more receptor domains {e,g. FKBP, cydophilin, FRB region of FRAP, etc.) for which a ligand, 
preferably a high-affinity ligaxul, is available. The second xiiimeric protein comprises an 
activation domain and one or more copies of one or more receptor domains (which may be the 
same or different than on the prior chimeric protein). The dimerizing reagent is capable of 

10 binding to the receptor (or ^gand binding^) domains present on each of the chimeras and thus of 
dimerizing or oiigomerizing ^e chimeias. DNA molecules encoding and directing the expression 
of these chimeric proteins are introduced into the ceUs to be engineered. Also introduced into the 
cells is a target gene linked to a DNA sequence to which the composite DNA*binding domain is 
capable of binding (if not already present within Oie cells). Contacting the engineered cells or 

15 their progeny with the oiigomerizing reagent leads to regulated activity of the transcription 
factor and hence to expression of the target gene. In cases where the target gene and recognition 
sequence are already present within the cell, the activation domain may be replaced by a 
transcription repressing domain for regulated inhibition of expression of the target gene. The 
design and use of similar components is disclosed in PCT/US93/01617. These may l>e adapted to 

20 ttie present invention by the use of a composite DNA-binding domain, and DNA sequence 
encoding it, in place of the alternative DNA-binding domaiiu as disclosed in the referenced 
patent document. 

The dimerizing ligand may be administered to the patient as desired to activate 
transcription of the target gene. Depending upon the binding affinity of tf\e ligand, the response 

25 desired, the manner of administration, the half-life, the number of cells present, various 

protocols may be employed. The ligand may be administered parenteraUy or orally. Thentunber 
of administrations will depend upon the factors described above. The ligand may be taken orally 
as a pill, powder, or dispersion; bucally; sublinguaUy; injected intravascularly, 
intraperitoneally, subcutaneously; by inhalation, or the like. The ligand (and monomeric 

30 antagonist compotmd) may be formulated using conventional methods and materials well known 
in the art for the various routes of administration. The predse dose and particular method of 
administration will depend upon the above factors and be determined by the attending physician 
or human or animal healthcare provider. For the most part, the manner of administration will 
be determined empirically. 



35 



W09««»51 PCrAJS95/16982 

In the event that transcriptional activation by the ligand is to be reversed or terminated, 
a monomeric compound whid\ can compete with the dimerizing ligand may be admirustered. 
Thus, in th case of an adverse reaction or the desire to terminate ttte therapeutic effect, an 
antagonist to ttie dimerizing agent can be administered in any convenient way, particularly 
5 intravascularly, if a rapid reversal is desired. Alternatively, one may provide for the presence 
of an inactivation donudn (or transcriptional sOencer) with a DNA binding domain. In another 
approach, cells may be eliminated dmmgh apoptosis via signaling through Fas or TNF receptor 
as described elsewhere. See Inten\ational Patent Applications PCT/US94/01617 and 
PCT/US94/08008. 

10 The particular dosage of the ligand for any application may be determined in accordance 

with the procedtires used for therapeutic dosage monitoring, where maintenance of a particular 
level of expression is desired over an extended period of times, for example, greater ftan about 
two weeks, or where there is repetitive therapy, with individual or repeated doses of ligand 
over short periods of time, with extended intervak, for example, two weeks or more. A dose of 
15 the ligand within a predetermined range would be given and monitored for response, so as to 
obtain a time-expression level relationship, as well as observing therapeutic resporwe. 
I>epending on the levels observed during the time period and the therapeutic response, one could 
provide a larger or smaUer dose the next time, following the response. This process would be 
itcratively repeated until one obtained a dosage within the therapeutic range. Where the 
20 ligand is chronicaUy administered, once the maintenance dosage of the Hgand is determined, one 
could then do assays at extended intervals to be assured that the ceUular system is providing the 
appropriate response and level of the expression product. 

It should be appreciated that the system is subject to many variables, such as the ceUular 
response to the Ugand, the ef Hdency of expression and, as appropriate, the level of secretion, 
25 the activity of the expression product, the particular need of the patient, which may vary with 
time and circumstances, the rate of loss of the cellular activity as a result of loss of cells or 
expression activity of individual cells, and the like. Therefore, it is expected that for each 
individual patient, ev«t if there were universal cells which could be administered to the 
population at large, each patient would be monitored for the proper dosage for the individual. 



30 



C. Gene Therapy: endogenous genes 

This invention is adaptable to a number of approaches for gene therapy involving 
regulation of transcription of a gene which is endogenous to the engineered cells. These 
approaches involve the use of a chimeric protein as a transcription factor to actuate or increase 



36 



wo 96^0951 



PCTAJS95/16982 



the transcriptian of an endogenous gene whose gene product is beneficial or to inhibit the 
transcription of an endogenous gene vdiose gene product is excessive^ disease-causing or otherwise 
determinentaL 

In one approach, a composite DNA-binding domain is designed or selected whidi is 

5 capable of binding to an endogenous nucleotide sequence linked to the endogenous gene of interest, 
e.g., a nucleotide sequence located widiin or in the vidnity of the promoter region or elsewhere in 
the DNA sequence flanking the endogenous gene's coding region. Alternatively, a known 
recognition sequence for a composite DNA-binding region may be introduced in proximity to a 
selected endogenous gene by homologous recombination to render die endogenous gene responsive 

10 to a corresponding chimeric transcription fador of this invention. See e.g. Gu et al,. Science 265. 
103-106 (1994). Constructs are made as descnbed elsewhere which encode a chimeric protein 
containing the composite DNA-binding region and a transcription activation domain. 
Introduction into cells of the DNA construct permitting expression of the chimeric transcription 
factor leads to specific activation of transcription of the endogenous gene linked to the 

15 recognition sequence for the chimeric protein. Repression or inhibition of expression of the target 
gene may be effected using a chimeric protein containing the composite DNA-binding region, 
which may abo contain an optional transcription inhibiting domain as described elsewhere. 
Again, as discussed elsewhere, the DNA construct may be designed to permit regulated 
expression of the chimeric protein, eg. by use of an inducible promoter or by use of any of the 

20 regulatable gene therapy approches which are known in the art Likewise tiie construct may be 
under the control of a tissue specific promoter or enhaxurer, permitting tissue-specific or cell-type- 
specific expression of the chimera and regulation of the endogenous gene. Finally, it should be 
noted that constructs encoding a pair of transcription factors containing ligand-binding domains 
permitting ligand-dependent function may be used in place of a single transcription factor 

25 construct 

D. Production of recombinant proteins and viruses. Production of recombinant therapeutic 
proteins for commercial and investigational purposes is often achieved through the use of 
mammalian cell lines engineered to express the protein at higji level The use of mammalian 
30 cells, rather than bacteria or yeast, is indicated where the proper function of the protein requires 
post-translational modifications not generally performed by heterologous cells. Examples of 
proteins produced commercially this way include erythropoietin, tissue plasminogen activator, 
dotting factors such as Factor Vntc antibodies, etc. The cost of producing proteins in this 
fashion is directiy related to the level of expression achieved in the engineered cells. Thus, 



37 



wo 96^0951 



PCT/US9S/16982 



because the constitutive two-transcription-unit system described above can achieve considerably 
higher expression levels than conventional expression systems, it may gready reduce the cost of 
protein production. A second limitation on the production of such proteins is toxicity to the host 
cell: Protein expression may prevent cells from growing to high density, sharply reducing 
5 production levels. Therefore, the ability to tightly control protein expression, as described for 
regulated gene therapy, permits cells to t>e grown to high doisity in the absence of protein 
production. Oidy after an optimum cell density is reached, is expression of the gene activated and 
the protein product subsequently harvested. 

A similar prd)lem is encoimtered in the coi\struction and use of "packaging lines" for the 
10 production of recombinant viruses for commercial {e.g., gene therapy) and experimental use. 
These cell lines are engineered to produce viral proteins required for die assembly of infectious 
viral particles harboring defective recombinant genomes. Viral vectors that are depmdent on 
such packaging lines include retrovirus, adenovirus, and adeno-assodated virus. In the latter 
case, the titer of the virus stock obtained from a packaging line is direcdy related to the level of 
15 production of the viral rep and core proteins. But these proteins are highly toxic to the host cells. 
Therefore, it has proven difficult to generate high-titer recombinant viruses. This invention 
provides a solution to this problem, by allowing the construction of packaging lines in which the . 
rep and core genes are placed imder the control of regulatable transcription factors of the design 
described here. The packaging cell line can be grown to high dertsity, infected with helper virus, 
20 and trai\sfected with the recombinant viral genome. Then, expression of the viral proteins 
encoded by the packaging cells is induced by the addition of dimerizing agent to allow the 
production of virus at high titer. 

E. Use of chimeric DBDs as genomic labelling reagents. Chimeric proteins containing a 
25 composite DNA binding region can be used to label recognized nucleotide sequences in DNA 
molecules, including whole genome preparations such as chromosome spreads and immobilized 
DNA matrices, that contain the specific recognition sites. This approach may be used for 
localizing these sequences to specific chromosomal regions after their introduction into genomic 
DNA, for example in a retroviral vector for a gene therapy application. More generally, 
30 chimeric proteins containing a composite DNA binding region may be used as reagents to reveal 
the location of their nucleotide recognition sites for appUcations such as gene mapping, where 
they may be used as cytogenetic markers. DNA binding by composite DNA binding regions may 
have advantages over techniques such as fluoresence in situ hybridization (FISH) in that shorter 
nudeotide sequences could be specifically recognized. These approaches require the chimeric 



38 



wo 96^0951 



PCT/US9S/16982 



protdn to be labelled in a way, for example by tagging with an epitope such as glutathione-&- 
transferase (GST) or the haemagglutmin (HA) tag, that can be readily visualized, e,g. by 
immunological and colorimetnc detection; by biotinylation f Uowed by detection with 
stieptavidin; or by fusion to a directly detectable moiety such as green fluorescent protein (GFP). 

5 

F. Biological research. This invention is applicable to a wide range of biological 
experiments in which precise recognition of a target gene is desired. These include: (1) expression 
of a protein or RNA of interest for biochemical pxirification; (2) regtdated expression of a protein 
or RNA of interest in tissue culture cells for the purposes of evaluating its biological function; (3) 

10 regulated expression of a protein or RNA of interest in transgenic arumals for the purposes of 
evaluating its biological function; (4) regulating the expression of anotfier regulatory protein 
that acts cm an endogenous gene for tfie purposes of evaluating the biological function of that 
gene. Transgenic animal models and otiter applications in which the composite DNA-binding 
domains of this invention may be used include those disclosed in US Patent Application Serial 

15 Nos. 08/292395 and 08/292396 (fQed August 18, 1994), 

G. Kits. This invention further provides kits txseful for the foregoing applications. One 
such kit contains a first DNA sequence encoding a dumeric protein comprising a composite DNA 
binding region of this invention (and may contain additional domains as discussed above) and a 

20 second DNA sequence containing a target gene liiUced to a DNA sequerure to which the chimeric 
protein is capable of binding. Alternatively, the second DNA sequence may contain a cloning site 
for insertion of a desired target gene by the practitioner. For regulatable applications, i.e., in 
cases in which the recombinant protein contains a composite DNA-binding domain and a receptor 
domain, the kit may further contain a ttiird DNA sequence encoding a transcriptional activating 

25 domain and a second receptor domain, as discussed above. Such kits may also contain a sample of 
a dimerizing agent capable of dimerizing the two recombinant proteins and activating 
transcription of the target gene. 

30 The following examples contain important additional inf ormatiorv exemplification and 

guidance which can be adapted to the practice of this invention in its various embodiments and 
the eqvuvalents thereof. The examples are offered by way illustration and not by way 
limitation. 



39 



WO9ra05«l PCT/OS95/169tt 

EXANfFLES 

The following examples describe the design, construction and uise of chimeric proteins 
containing a composite DNA-binding region, identification of a consensus nudeic add sequence 
bound by the composite DNA-biruiing region, assessment of its binding spedHdty and 
5 demonstration of its in vivo activity. The teachings of references dted herein are hereby 
incorporated by reference. 

Example 1: Computer Modeling 

Computer modeling studies (PRCXTEUS and MOGU) were used to visualize how zinc 
10 fingers might be fused to the Oct-1 homeodomain. The known crystal structures of the 2:if26S- 
DNA (Pavletich and Pabo, Science 252:809 (1991)) and Oct-l-DNA (Klemm, et al. Cell 2Z:21 
(1994)) complexes were aligned by superimposing phosphates of the double helices in several 
different orientations. This study yielded two arrangements which appeared to be suitable for 
use in a chimeric protein. 

15 Each model was constructed by juxtaposing portions of two different crystallographically 

determined protein-DNA complexes. Models were initially prepared by superimposing 
phosphates of the double helices in various registers and were analyzed to see how the 
polypeptide chains mi^t be connected. Superimposing sets of phosphates typically gave root 
mean squared distances of 05-15 A between corresponding atoms. These distance gave some 
20 perspective on the error limits involved in modeling, and imcertainties about the precise 
arrangements were one of the reasons for using a flexible lir\ker contairung several glycines. 

In one alignment, the carboxyl-termirial region of zinc finger 2 was 8.8 A away from the 
amino-terminal region of the homeodomain, suggesting that a short polypeptide linker could 
connect these domains. In this model the chimeric protein would bind a hybrid DNA site with 
25 the sequence 5'-AAATNNTGGGCG-3* (SEQ ID NO.: 18), The Oct-1 homeodomain would 

recognize the AAAT subsite, zinc finger 2 would recognize the TGG subsite, and ziiK finger 1 would 
recognize *e CCG subsite. No risk of steric interference between the domains was apparent in 
this model. 

The second plausible arrangement would also have a short polypeptide linker connecting 
30 zinc finger 2 to the homeodomain (a distance of less than 10 A); however, the subsites are 

arranged so that the predicted binding sequence is 5'-CGCCCANNAAAT-3' (SEQ ID NO.: 19). 
This model was not expUdtly used in the subsequent studies, although it is possible that the 
flexible linker will also allow ZFHDl to recognize this site. 



40 



W 9d/20951 



FCrAJS95/16982 



Example 2: Constniction of a Chimeric Protdzi 

The design strategy was tested by constnicti n of a diimeric protein, ZFHDl, that 
contained fingers 1 and 2 f Zif268, a gfydxie^lydne-arginine-arginine linker, and t^ 
homeodomain (Figure lA). A fragment encoding Zif268 residues 333*390 (Christy et al., Proc, 

5 NatL Acad. ScL USA a£:7857 (1988)), two glycines and the Oct-1 residues 378-439 (Sturm et al. 
Genes & Development 2:1582 (1988)) was generated by polymerase chain reaction, confirmed by 
dideoxysequendng, and cloned into the BamHI site of pGEX2T (Fharmada) to generate an in- 
frame fusion to glutathione S-transferase (GST). The GST-ZFHDl protein was expressed by 
standard methods {Avsabeietal^ Eds., Current PROTOCOLS IN MOLECULAR BioijOGY (John V^ey 

10 & Sons, New York, 1994), purified on Glutathione Sepharose 4B (Pharmada) according to the 
manufacturer's protocol, and stored at -8(rC in 50 inM Tris pH 8.0, 100 mM KQ, and 10% glycerol. 
Protein concentration was estimated by densitometric scanning of coomassie-stained SDS PAGE- 
resolved proteins using bovine serum albuxtun (Boehringer-Mannheim Biochemicals) as standard. 
The DNA-binding activity of this diimeric protein was determined by selecting binding sites 

15 from a random pool of oligonudeotides. 

Example 3: Consensus Binding Sequences 

The probe used for random binding site selection contained the sequence 
S'-GGCTGAGTCTGAACGGATCCNjsCCTCGAG ACTGAGCGTCG-S' (SEQ ID NO.: 22). Four ' 

20 rounds of selection were performed as described in Pomerantz and Sharp, Biochemistry 22:10851 
(1994), except that 100 ng polyId(I-Q]/poly[d(K:)] and 0.025% Nonidet P-40 were induded in 
ttie binding reaction. Sdections used 5 ng randomized DNA in the first round and approximately 
1 ng in subsequent rounds. Binding reactions contained 6.4 ng of GST-ZFHDl in round 1, 1.6 ng in 
round 2^ 0.4 ng in roimd 3 and 0.1 ng in rouiKl 4. 

25 After four rounds of selection, 16 sites were cloned and sequenced (SEQ ID NOS.: 1-16, 

Figure IB). Comparing these sequences revealed the consensus binding site 5*-TAATTANGGGNG- 
3' (SEQ ID NO.: 17). The 5' half of this consensus, TAATTA, resembled a canonical homeodomain 
binding site TAATMN (Laughon, (1991)), and matched the site (TAATNA) that is preferred by 
the Oct-1 homeodomain in the absence of the POU-specific domain (Verrijzer et aL, EMBO 

30 11:4993 (1992)). The 3* half of the consensus, NGGGNG, resembled adjacent binding sites for 
fingers 2 (TGG) and 1 (GCG) of Zif268. The guanines were more tightly conserved than the other 
positiorts in these zinc finger subsites, and the crystal structure shows tfiat these are the positions 
of the critical side chain-base interactions (Pavletich and Pabo (1991)). 



41 



W 9OT0951 PCr/US»S/16982 

The consensus sequence of ZFHDl was detennined (S'-TAATTANGGGNG-S', SEQ ID 
NO^ 17), but because of the internal symmetry of the TAATTA subsite this sequence was 
consistent with the homeodonuun binding in dtfwr of two orientations (Figure IC, compare mode 
1 and mode 2). The second arrangement (Figure IC, mode 2), in whidi tfie critical TAAT is on the 
5 other strand and directly juxtaposed with tixe zinc finger (TGGGCG) subsites, was conadered 
todikely since modeling suggested tiiat ttiis arrangement required a linker to span a large 
distance between tiie caiboxyl-terminal region of finger 2 and the amino-terminal region of the 
homeodomain. 

To determine how the homeodomain binds to the TAATTA sequence in the 5" half of the 
10 consensus. ZFHDl was tested for binding to probes (S'-TAATGATGGGCG-y, SEQ ID NO.: 21, and 
5'-T£ATrATGGGCG-3*, SEQ ID NO.: 23) designed to distinguish between these orientations. 
ZFHDl bound to ttie y-TAATGATCGGCG-T probe with a dissodation constant of 8.4 x 10-«» M, 
and preferred this probe to the y-TCATTATGGGCG-S' probe by a factor of 33. This suggests that 
the first four bases of the consensus sequence form the critical TAAT subsite that is recognized by 
15 the homeodomain and that ZFHDl binds as predicted in the model shown in mode 1 of Figure IC. 

Example 4: Novel Specificity 

ZFHDl, the Oct-1 POU domain (containing a homeodomain and a POU-specific domain, 
Pomerantz et "^'^''^"P'"^"* ^=^047 (1992)) and the three zinc fingers of Zif268 

20 (obtained from M Eirod-Erickson) were compared for their abiUties to distinguish among the 
Oct-1 site 5 -ATGCAAATGA-3- (SEQ ID NO.: 20), the Zil268 site 5 -GCGTGGGCG-3- and the 
hybrid binding site S'-TAATGATGGGCG-J (SEQ ID NO.: 21). DNA-binding reaction contained 
10 inM Hepes (pH 7.9), 05 mM EDTA, 50 mM KQ. 0.75 mM DTT, 4% ricoll-400, 300 ng/ml of 
bovine serum albumin, with the appropriate protein and binding site in a total volume of 10 \iL 
25 The concentration of binding site was always lower than the apparent dissociation constant by at 
least a factor of 10. ReactiOTs were incubated at 30»C for 30 minutes and resolved in 4% 
nondenaturing polyacrylamide gels. Apparent dissociation constants were determined as 
described in Pomerantz and Sharp, fiiJidlfimistlJC 22=10851 (1994). Probes were derived by cloning 
the foUowing fragments into the Kpn I and Xho 1 sites of pBSiai+ (Stratagene) and excising the 
30 fragment With Asp718 and Hind nL 

5 -CCTCGAGGI£AIIAIliGGLGCTAGGTACC-3 ISEQ 10 NO. : 24). 
5 -CCTCGAGGCfiOXAIIlAIIACTAGGTACC-a ISEQ 10 NO.: 25). 
S -CCTCGAGGCfilXllAIlfiCCTAGGTACC-S- (SEO ID NO. : 26). 
5 -CCTCGAGGICAiniSCAIACTAGGTACC-S- (SEQ ID NO. : 27). 



42 



wo 96/20951 



PCTAJS95/16982 



Hie GST-ZFHDl protein was titrated into DNA-binding reactions containing the probes 
listed at top f each set f lanes in Figure 2. Lanes 1, 6, 11 and 16 contained the protein at 9.8 x 
10-11 and protein concentration was increased in 3-fold increments in subsequent lanes of each 
set. The chimeric protein ZFHDl preferred the optimal hybrid site to the octamer site by a 

5 factorof 240 and did hot bind to the Zif site. 

The Ocl-l-POU protein was titrated into DNA-binding reactions as with ZFHDl, but 
lanes 1, 6, 11 and 16 contained tfie protein at 2.1 x IQ-^^ The POU domain of Oct-1 bound to the 
octamer site witti a dissociation constant of 1.8 x 10*^*> Nt preferring this site to the hybrid 
sequences by factors of 10 and 30, and did not bind to the Zif site. 

10 A peptide containing Zif fingers 1, 2 and 3 was titrated into DNA-binding reactions as 

with ZFHDl and the Oct-l-POU protein with lanes 1, 6, 11 and 16 containing the peptide at 3.3 x 
10*^^ M. The three fingers of Zif268 bound to the 2Sf site with a dissociation constant of 33 x lO"^^' 
M, and did not bind to the other three sites. These experiments show tfuit ZFHDl binds tightly 
and specifically to the hybrid site and displayed DNA-binding specificity that was clearly 

15 distinct from that of either of the original proteins. 

Example 5: in vivo Activity 

ZFHDl was fused to a transcriptional activation domain, and transfection experiments 
were used to determine whether the novel DNA-binding protein could function in vivo. An 

20 expression plasmid encoding ZFHDl fused to the carboxyl-terminal 81 amino adds of the Herpes 
Simplex Virus VP16 protein (ZFHD1-VP16) was co-transfected into 293 cells with reporter 
constructs containing the SV40 promoter and the firefly ludferase gene (Figure 3). The 293 cells 
were co-transf ected with 5 \ig of reporter vector, 10 jig of expression vector, and 5 jxg of pCMV- 
hGH used as an internal control The reporter vectors contained two tandem copies of either the 

25 ZFHDl site (TAATGATGGGCG), the Oct-1 site (ATGCAAATGA), the Zif site (GCGTGGGCG) or 
no insert. 

The ZFHD1-VP16 expression vector was constructed by cloning a fragment encoding ten 
amino add polypeptide epitope MYPYDVPDYA; ZFHDl; and VP16 residues 399-479 (Pellett et 
aL, Proc NaiL Acad ScL USA 32:5870 (1985)) into the Not I and Apa I sites of Rc/CMV 
30 (Invitrogen). Reporter vectors were constructed by doning into the Xho I and Kpn I sites of pGL2- 
Promoter (Promega) the following fragments: 
R-.r^gTArrAfi TATf^rAAATCA CTSCAG TATRrAAATRA rrTCGAG-a- (SEQ ID NO. : 28). 

g • -r^r^TArrAri GrGTGGGrG rTRCAC GrGTGGGrG rrT CGAG-3' ISEO ID NO.: 29). 

-rCTArrARTAATf^ATGGGCGCTGCAG TAATGATGGGrGCCTCCAG-a ' (SEO IDNO. : 30). 



43 



W ^0951 



PCT/US9S/169n 



The 293 cells were transfected using caldtun phosphate precipitation mth a glycerol 
shock as described in Ausubel et aL, Eds., Current ProtnroU in Molecular Rinlngy (John Wiley & 
Sons, New Y rk, (1994). Quantitation f hGH production was perf nned using the Tandem-R 
HGH Immunoradiometric Assay (Hybzitech Inc, San Diego, CA) according to the manufacturer's 
5 instructions. Cell extracts were made 48 hours after transfection and ludferase activity was 
determined using 10 (il of 100 ^1 total extract/10 cm plate and 100 \d of Ludferase Assay Reagent 
(Promega) in a ML2250 Luminometer (Dyiuitech Laboratories, Chantilly, VA) using the 
enhanced flash program and integrating for 20 seconds witfi no delay. The level of ludferase 
activity obtained, normalized to hGH production, was set to 1.0 for the co-transfection of 
10 Rc/CMV with the no-insert reporter pGL2-Promoter. 

To determine whether the chimeric protein coiild specifically regulate gene expression, 
reporter constructs contairung two tandem copies of dthcr the ZFHDl site S'-TAATG ATGGGCG- 
3\ the octamer site 5 -ATGCAAATGA-3* or the Zif site 5 -GCGTGGGCG-3* inserted upstream of 
the SV40 promoter were tested. When the reporter contained two copies of the ZFHDl site, the 
15 ZFHD1-VP16 protein stimulated the activity of the promoter in a dose-dependent manner. 
Furthermore, the stimulatory activity was spedfic for the promoter containing the ZFHDl 
binding sites. At levels of protein which stimulated this promoter by 44-fold, no stimulation 
above backgrovmd was observed for promoters containing the octamer or Zif sites. Thus, ZFHDl 
effidcntly and specifically recognized its target site in vivo. 

Example 6: Additional Examples 

The following additional examples illustrate chimeric proteins contairung the composite 
DNA-binding domain ZFHDl together with various other domains, and the use of these 
chimeras in constitutive and ligand-dependent transcriptional activation. 

A. Plasmids 

pCGNNZFHDl 

An expression vector for directing the expression of ZFHDl coding sequence in 
mammalian cells was prepared as follows. Zif268 sequences were amplified from a cDNA done 
by PGR using primers 5"Xba/Zif and 3'Zif+G. Octl homeodomain sequences were amplified from 
a cDNA done by PGR using primers 5'Not Oct HD and Spe/Bam 3*Oct The Zif268 PC31 fragment 
was cut with Xbal and Notl. The Octt PGR fragment was cut with NotI and BamHI. Both 
fragments were Ugated in a 3-way Ugation between the Xbal and BamHI sites of pGGNN (Attar 



25 



44 



wo 96/20951 



PCTA7S95/169S2 



and Gilxxum, 1992) to make pCGNNZFHDl in which the cDNA insert is under the 
transcriptional control of human CMV promoter and enhancer sequences and is linked to the 
nuclear localization sequence from SV40 T antigen. The plasmid pCGNN abo contains a gene for 
ampidllin resistance which can serve as a selectable marker. 

5 

pCGNNZFHDl-p65 

An expression vector for directing the expression in mammalian cells of a chimeric 
transcription factor containing the composite DNA-binding domain, ZFHDl, and a transcription 
activation domain from p65 (human) was prepared as follows. The sequence encoding die C- 
10 terminal region of p65 containing the activation domain (amino add residues 450-550) was 
amplified from pCGN-p65 using primers p65 5' Xba and p65 3* Spe/Bam. The PGR fragment was 
digested with Xbal and BamHl and ligated between the the Spel and BamHl sites of pGGNN 
ZFHDl to form pCGNN ZFHE>p65AD. 

15 The P65 transcription activation sequence contains the following linear sequence: 

CTGGGGGCCTTGCTTGGCAACAGCACAGACCCAGCTGTGTTCACAGACCTGGCATCCGTCGA 
CAACTCCGAGTTTCAGCAGCTGCTGAACCAGGGCATACCTGTGGCCCCCCACACAACTGAGC 
CCATGCTGATGGAGTACCCTGA6GCTATAACTCGCCTAGTGACAGGGGCCCAGAGGCCCCCC 
20 GACCCAGCTCCTGCTCCACTGGGGGCCCC6GGGCTCCCCAATGGCCTCCTTTCAGGAGATGA 
AGACTTCTCCTCCATTGCGGACATGGACTTCTCAGCCCTGCTGAGTCAGATCAGCTCC 

pCGNNZFHDl-FKBPx3 

An expression vector for directing the expression of ZFHDl linked to three tandem 
25 repeats of htunan FKBP was prepared as follows.Three tandem repeats of human FKBP Were 
isolated as an Xbal-lSiamHI fragment from pCGNNF3 and ligated between the Spel and BamHl 
sites of pCGNNZFHDl to make pCGNNZFHDl-FKBPx3 (ATCC Accession No. ). 

pZHWTxSSVSEAP 

30 A reporter gene construct containing eight tandem copies of a ZFHDl binding site 

(Pomerantz et al, 1995) and a gene encoding secreted alkaline phosphatase (SEAP) was 
prepared by ligating the tandem 2THD1 binding sites between the Nhel and BgUI sites of 
pSEAP-Promotcr Vector (Qontech) to form pZHWTxSSVSEAP. The ZHWTxSSEAP reporter 
contains two copies of the following sequence in tandem: 



45 



w 



96/20951 



PCT/US9S/16982 



^Ty^QrTftft Tn^TRfiRrR rTraAaTAATnATnccrfiRTrgArTflATRATnfiRrRCTCGAGTAATnATn 

The ZFHDl binding sites are underlined. 

pCGNNFlandF2 

One or two copies of FKBP12 were amplified from pNF3VE using primers FKBP 5' Xba and 
FKBP 3' Spe/ Bam. The PCR fragments were digested with Xbal and BamHl and ligated 
between the Xbal and BamHl sites of pCGNN vector to make pCGNN Fl or pPCGNN F2. 
pCGNNZFHDl-FKBPx3 can serve as an alternate source of the FKBP cDNA. 



pCGNNF3 

A fragment contaiiung two tandem copies of FKBP was excised from pCGNN F2 by digesting 
with Xbal and BamHl. This fragment was ligated between the Spel and BamHl sites of 
IS pCGNNFl. 



pCGNN F3VP16 

The C-terminal region of the Herpes Simplex Virus protein, VP16 (AA 418-490) containing 
the activation domain was amplified from pCG-Gal4-VP16 using primers VP16 S Xba and VP16 
20 3' Spe/Bam. The PGR fragment was digested with Xbal and BamHl and Ugated between the 
Spel and BamHl sites of pCGNN F3 plasmid, 

pCGNN F3p65 

The Xbal and BamHl fragment of p65 containing *e activation domain was prepared as 
25 described above. This fragment was Hgated between the Spel and BamHl sites of pCGNN F3. 

B. Primers 

5 * Xba/Z i f 5' ATGCTCTAGAGAACGCCCAT ATGCTTGCCCT 

3 • 2 1 f ♦G 5* ATGCGCGGCCGCCGCCTGTGTGGGTGCGGATGTG 

30 

5 Not OctHD S.ATGCGCGGCCGCAGGAGGAAGAAACGCACCAGC 

Spe/Bam 3-Oct 5* GCATGGATCCGATTCAACTAGTGTTGATTCTTTTTTCTTTCTGGCGGCG 

FKBP 5 'Xba 5* TCAGTCTAGAGGAGTGCAGGTGGAAACCAT 

35 FKBP 3' Spe/Bam 5 * TCAGGGATCCTCAATAACTAGTTTCCAGTTTTAGAAGCTC 



46 



wo 9^0951 



PCT/US9S/16982 



VP 16 5* Xbo 

VP 16 3' Spe/Bom 



5ACTGTCTAGAGTCAGCCTGGGGGACGAG 
5*GCATGGATCCGATTCAACTAGTCCCACCGTACTCGTCAATTCC 



5 P65 5" Xba 
p65 3* Spe/Bom 



5 ' ATGCTCTAGACTGGGGGCCTTGCTTGGCAAC 

5 • GCATGGATCCGCTCAACTAGTGGAGCTGATCTGACTCAG 



C. Dimerizing agent 

FK1012 consists of two molecules of die natural product FK506 covalently joined to one 
10 another by a synthetic linker and can be prepared from FK506 using published procedures. See 
e,g. PCT/US94/01617 and SpeiKer et (d, 1993. FK1012 is capable of binding to two FKBP domains 
and functioning as a dimerizing agent for FKBP-containing chimeric proteiris. 




FK1012 

15 

(1) ZFHDl-p65 and ZFHD1-VP16 chimeric proteins activate transcription of a target gene linked 

to a nucleotide sequence containing ZFHDl binding sites. 

HT1080 cells were grown in MEM (GIBCO BRL) supplemented with 10% Fetal Bovine Serum. 

Cells in 35 mm dishes were transiently transfected by lipofection as follows: 10, 50, 250 ng of 
20 ZFHD-activation domain fusion plasmids together with 1 p.g of p2HWTx8SVSEAP plasmid 

DNA were added to a microfuge tube with pUCllS plasmid to a total of 25 jig DNA per tube . 

Hie DNA in each tube was then mixed witti 20 ^g lipofectamine in 200 pi OFTIMEM (GIBCO 

BRL). Hie DNA-lipofectamine mix was incubated at room temperature for 20 min. Another 800 ^1 

of OFTIMEM was added to each tube, mixed and added to HTIOSO cells previously washed with 
25 1ml DMEM (GIBCO BRL). The cells were incubated at 37 **C for 5 hrs. At this time, the DNA- 

lipofectamine media was removed and the cells were refed with 2 ml MEM containing 10% Fetal 



47 



W 96nmi PCr/US95/16982 

Bovine Serum. After 24 hrs incubation at 37 **C, 20 nl of media was removed and assayed for 
SEAP activity as described (Spencer et al, 1993). 

Results 

5 Both ZFHD1-VP16 and ZFHDl-p65 support transcriptional activation of a gene encoding 
SEAP linked to ZFHDl binding sites. The results are shovm in Rgure 4A. 

(ii) FK1012-dependent transcriptional activation with ZFHDl-FKBPx3 and FKBPx3-VP16 or 
FKBPx3-p65 

10 293 cells were grown in 1>MEM (Gibco BfeL) supplemented with 10% Bovine Calf Serum. 

Cells in 35mm dishes (25 x 10^ cells/dish) were transiently transfected with use of calchmi 

phosphate precipitation (Ausubel et aL, 1994). Each dish received 375 ng pZHWTx8SVSEAP; 

12ng pCGNNZFHDl-FKBPx3 and 25ng pCGNNFKBPx3-VP16 or pCGNNFKBPx3-p65. 

Following traiisfection, 2ml fresh media was added and supplemented witii FK1012 to the 
15 desired concentration. After a 24 hour incubation 100ml aliquot of media was removed and 

assayed for SEAP activity as descrit)ed (Spencer et. al., 1993). 

Results 

ZFHDl-FKBPx3 supports FK1012 dependent transcriptional activation in conjxmction with 
20 FKBPx3-VP16 or FKBPx3-p65. Peak activation was observed at FK1012 concentration of lOOnM. 
See Figure 4B. 

(iii) Synthetic dimcrizcr-dcpcndcnt transcriptional activation witii ZFHDl-FKBPx3 and 
FKBPx3-VP16 or FKBPx3-p6S 
25 An analgoous experiment was conducted using a wholly syntfietic dimerizer in place of 
FKIOIZ Like FK1012, the synthetic dimerizer is a divalent FKBP-binder and is capable of 
dimcrizing chimeric proteins which contain FKBP domains. In this experiment, 293 cells were 
grown in DMEM supplemented with 10% Bovine Calf Scmm. Cells in 10 cm dishes were 
transientiy transfected by caldum phosphate precipitation (Natesan and Oilman, 1995, Mol 
30 an Biol 15, 5975-5982). Each plate received 1 \ig of pZHWTxSSVSEAP reporter, 50 ng 

pCGNNZFHDl-FKBP3x3, 50 ng pCGNNF3p65 or pCGNNF3VP16. FoUowing transfection, 2 ml 
fresh media was added and supplemented with a synthetic dimerizer to the desired 
concentration. After 24 hrs, 100 \il of ti^e media was assayed for SEAP activity as described 
(Spencer et al, 1993). 



48 



wo 96/20951 



PCTA7S9S/16982 



Results 

ZFHDl-FKBPx3 supp rts synthetic dimerizer-dependent trazisciipti nal activation in 
conjunction with FKBPx3-VP16 or FKBPx3-p65. See Fig 4C. 

5 

References 

1. Attar, R.M., and MZ. Gilman 1991 Mol Cell BioL 12:2432-2443 

2. Ausubel et al, Eds., 1994. CURRENT PROTOCOLS IN MOLECULAR Biology (Wiley, NY) 

3. Pomerantz, J.L., et al 1995. Science. 26753-96. 

10 4. Spencer, cf fli. 1993. Scifwcc. 262:1019-1024. 

Example 7: Rapamycin-dependcnt transcriptional activation with ZFHDl-FKBPx3 and FRAF- 
p65 in whole animals 

Using tiie approach described in Example 6, constructs were prepared encoding the 2FHD1- 
15 FKBPx3 hision protein, a second fusion protein containing the FKBPTapaymdn binding ('TRB") 
region of FRAP linked to the p65 activation domain, and a reporter cassette containing a gene 
encoding human growth honnone linked to multiple ZFHDl binding sites. The natural product, 
rapamydn, forms a ternary complex wifl\ FKBP12 and FRAP. Similarly, rapamydn is capable of 
binding to one or more of the FKBP domains and FRAP FEB domains of the fusion proteins. The 
20 three constructs were introduced into HT1080 ceUs vihidn were then shown to support rapamydn- 
dependent expression of the hGH gene in ceD culture, analogously to the experiments described in 
Example 6. 

2 X 10^ cells from the transfected HT1080 culture were administered to nu/nu mice by 
intramuscular injection. Following cell implantation, rapamydn was administered i.v. over a 
25 range of doses (from 10 - 10,000 Mg/kg). Serum samples were collected from the mice 17 hours after 
rapamydn administration. Control groups consisted of mice that received no cells but 1.0 mg/kg 
rapamydn (Lv.) as well as mice that received the cells but no rapamydn. 

Dose-responsive expression of hGH was observed (as circulating hGH) over the range of 
rapamydn doses administered. Ndther control group produced measurable hGH. The limit of 
30 detection of the hGH assay is 0.0125 ng/mL See Figure 5. 

These data show functional DNA binding of ZFHDl-FKBP(x3) to a ZFHDl binding site in 
the context of dimerization with another fusion protein in whole animals. These data 
demonstrate that in vivo administration of a dimerizing agent can regulate gene expression in 



49 



WO96/20951 PCr/OS95a«W2 

whole animals of secreted gene products from cells containing the fusion proteins and a responsive 
target gene cassett . We have prcvioxisly demonstrated that a bolus hGH administration, cither 
Lp. or Lv., results in rapid hGH clearance with a half-life of less than 2 minutes and 
tmdetectable levels by 30 minutes. Therefore, the observed hGH secretion in this example 
5 appears to be a sustained phenomenon. 

Example 8: FRAP FRB constmcts 

This Example provides further background and inf onnation relevant to constructs encoding 
chimeric proteins containing an FRB domain derived feom FRAP for use in the practice of this 
10 invention. The VP16-FRB construct described below is analogous to the p65-FRB cor\struct \ised 
Example 7. 

Rapamydn is a ruitural product whidi binds to a FK506-binding protein, FKBP, to form a 
rapamydiuFKBP complex. That complex binds to the protein FRAP to form a ternary, 
(FKBP:rapamycin]:lFRAP], complex. The rapamydn-dependent assodation of FKBP12 and a 289 

15 kDa mammalian protein termed FRAP, RAFTl or RAFTl and its yeast homologs DRR and TOR 
(hereafter refered to as "FRAP*) have been described by several research groups. See e.g. Brown 
et oi, 1994, Nature 369:756-758, Sabatini et al 1994, Cell 78:35-43, Chiu et fl/, 1994, Proc. Natl 
Acad. ScL USA 91:12574-12578, Chen et al, 1994, Biochem. Biophys. Res. Comm. 203:1-7, Kunz et 
al, 1993 Cell 73585-596, Caf f erkey et al, 1993 MoL Cell Biol 13:6012-6023. Chiu et al, supra, 

20 and Stan et al 1994, /. Biol Chem. 269:32027-32030 describe the rapamydn-dependent binding of 
FKBP12 to smaller subimits of FRAP. 




rapamydn 



50 



wo 96/20951 



PCT/US95a6982 



C natmct encoding FRAP doinain(a)-VP16 transciipti nal activation domain(s)-cpitope tag. 

Hie starting point f r assembling tiiis construct was d\e eukaryotic expression vector 
pBJS/NFlE, described in PCr/US94/01617, pBJ5 is a derivative of pCDL-SR (MCB 8, 466-72) in 
which a polylinker containing 5* SadI and 3* EcoRI sites has been inserted between the 165 splice 
5 site and tf\e poly A site. To construct pBJ5/NFl£ a cassette was cloned into this polylinker that 
contained a Kozak sequence and start site, ttie coding sequence of the SV40 T antigen nuclear 
localization sequence (NLS), a single FKBP domain, and an epitope tag from the H. influenza 
haemagglutinin protein (HA), flanked by restriction sites as shown below: 

10 Kozak SV40 NLS FKBP (5 ) 

M EDPKKKRKVLEGVOY E... 

CCGCGGCCACCATGCTCGACCCTAAGAAGAAGAGAAAGGTACTCGAGGGCGTGCA6GTGGA6. . . 
Sod I (X/S) Xhol 

15 FKBP (3*1 HAIflultag 

...LLKLEVDYPYDVPOY AEOEnd 

CTTCTAAAACTGGAAGTCGACTATCCGTACGACGTACCAGACTACGCACTCGACTAAGAATTC 

Sol! (X/S) EcoRI 

20 

where pC/S) denotes the result of a ligation event between the compatible products of digestion 
by Xhol and Sall« to produce a sequence that is deavable by neither enzyme. Thus the Xhol and 
Sail sites that flank the FKBP coding sequence are imique. 



25 The series of constructs encoding FRAP-VP16 fusions is assembled from pBJ5/NFlE in two 
steps: (i) the Xhol-Sall restriction fragment encoding FKBP is excised and replaced with 
fragments encompassing all or part of the coding sequence of human FRAP, obtained by PCR 
amplification, generating construct NRIE and relatives (where R denotes FRAP or a portion 
thereof; (ii) the coding sequence of the VF16 activation domain is cloned into the unique Sail site 

30 of these vectors to yield construct NRIVIE and relatives. At each stage additional 

manipulations are performed to generate constructs encoding multimers of the FRAP-derived 
and/or VP16 domains. 

(i) Portions of human FRAP diat include the region required for FRAP binding are amplified 
35 by PCR using a 5* primer that contains a Xhol site and a 3* primer that contains a Sail site. The 
amplified region can encode full-length FRAP (primers 1 and 4: fragment a); residues 2012 
through 2144 (a 133 amino add region that retains the ability to bind FKBP-rapamycin; see Chiu 
et aL (1994) Proc. NaU. Acad. ScL USA 91: 12574-12578)(primers 2 and 5: fragment b); or residues 



51 



WO9d/20951 



PCrAJS9S/16982 



2025 through 2114 (a 90 amin add legian that also retains this ability; sec Chen et aL (1995) 
Proc NatL Acad. ScL USA 92: 4947-4951)(priiners 3 and 6: fragment c). The DNA is amplified 
from human cDNA or a plasmid containing the FRAP gene by standard methods, and the PGR 
product is isolated and digested with Sail and XhoL Plasmid pBJS/NFlE is digested with Sail 
5 and Xhol and the cut vector purified. The digested PCR products are ligated into tiie cut vector to 
produce the constructs NRalE, NRblE and NRclE, where Ra, Rb and Rc denote the hiU-kngth or 
partial FRAP fragments as indicated above. The constructs are verified by DNA sequencing. 

Multimers of the FRAP domains are obtained by isolating the Ra, Rb or Rc sequences from the 
10 NRalE, NRblE and NRclE vectors as Xhol/Sall fragments and then ligating these fragments 
back into the parental construct linearized with Xhol. Constructs contaiiung two, three or more 
copies of the FRAP domain (designated NRa2E, NRa3E, NRblE, NRb3E etc) are identified by 
restriction or PCR aiwdysis and verified by DNA sequencing. 

15 5* ends of amplified products: 

FRAP fragment a (full-length: priicner 1) 

LELGTGPAA 
20 5* CGAGTCTCGAGCTTGGAACCGGACCTGCCGCC 
Xhol 

FRAP fragment b (residues 2012-2144: primer 2) 

25 LEVSEE LIR 

5* CGAGTCTCGAGGTGAGCGAGGAGCTGATCCGA 
Xhol 

FRAP fragment c (residues 2025-2114: primer 3) 

^ LEEMW HEGL 

5 • C6AGTCTCGAGGAGATGTGGCATGAAGGCCTG 
Xhol 

35 3* ends of amplified products: 

FRAP fragment a (fuU-length: primer 4) 

IGWCPFWVO 
40 5' ATTGGCTGGTGCCCTTTCTGG6TCGACCGAGT 
3- TAACCGACCACGGGAAAGACCCAGCTGGCTCA 

So 1 1 

45 FRAP fragment b (residues 2012-2144: primer 5) 



52 



wo 9600951 



PCTAJS9S/16982 



5 



LAVPGTYVO 
5 * TTGGCTGTGCCAGGAACATATGTCGACCGAGT 
3 • AACCGACACGGTCCTT6TATACA6CTGGCTCA 

Sal I 

FRAP fragment c (residues 2012-2144: primer 6) 



FRRIS KOVD 
5 ' TTCCGACGAATCTCAAAGCAGGTC6ACCGAGT 
10 3" AAGGCTGCTTA6AGTTTCGTCCAGCTGGCTCA 

Sail 

(ii) The VP16 transcriptional activation domain (amino adds 413-490) is amplified by PCR 
using a 5* primer (primer 7) containing a Xhol site and a 3* primer (primer 8) contaiiung a Sail 

15 site. The PCR product is isolated, digested with Sail and Xhol, and ligated into plasmid 
pBJ5/NFlE digested with Sail and Xhol ta generate the intermediate NVIE, The construct is 
verified by restriction or PCR analysis arul DNA sequencing. Multimerized VP16 domains are 
created by isolating the single VP16 sequence as a Xhol-Sall fragment from NVIE, and then 
ligating this fragment back into NVIE that is linearized with Xhol. This process generates 

20 constructs NV2E, NV3E and NV4E etc which can be identified by restriction or PCR analysis and 
verified by DNA sequencing. 



5' end of PCR product 

413 

25 LEAPPTDV 
5' CGACACTCGA6GCCCCCCCGACCGATGTC 
Xhol 



3* end of PCR product 

30 490 

0 E Y G G V D 
5' GACGAGTACGGTGGGGTCGACTGTCG 
3' CT6CTCATGCCACCCCAGCTGACAGC 

Sail 

35 

The final constructs encoding fusions of portions of FRAP with VP16 are created by transferring 
the VP16 sequences into the series of FRAP-encoding vectors described in (i). Xhol-Sall fragments 
erKoding the 1, 2, 3 and 4 copies of the VP16 activation domains are generated by digestion of 
NVIE, NV2E, NV3E arid NV4E. These fragments are tficn ligated into vectors NRalE, NRblE 
40 and NRclE linearized with SaH, generating NRalVlE, NRblVlE, NRclVlE, NRalV2E, 

NRblV2£, etc. Similarly, vectors encoding multiple copies of the FRAP domains are obtained by 
ligation of the same fragments into vectors NRa2E, NRa3E, NRb2E, NRb3E etc. All of these 
vectors are identified by restriction or PCR analysis and verified by DNA sequencing. Thus the 



53 



PCT/US95a6982 



final series of vectors encodes (from the N to the C terminus) a nuclear localization sequence, on 
or more FRAP-dcrived domains hised N-terminally to one or more VP16 transcriptional 
activation domains (contained on a single Xh I-Sall fragment ), and an epitope tag. 

5 Oligonucleotides: 

1 5* CGAGTCTCGAGCTTGGAACCGGACCTGCCGCC 

2 5* CGAGTCTC6AGGTGAGCGAGGAGCTGATCCGA 

3 5' CGA6TCTCGA6GAGATGTGGCATGAAGGCCTG 
10 4 5' ACTCG6TCGACCCAGAAAGGGCACCAGCCAAT 

5 5* ACTCG6TCGACATATGTTCCTGGCACAGCCAA 

6 5* ACTCGGTCGACCTGCTTIGAGATTCGTCGGAA 

7 5' C6ACACTC6AGGCCCCCCCGACCGATG7C 

8 5' CGACAGTCGACCCCACCGTACTC6TC 

Sequence of representative final construct (NRclVlE): 

iCozak SV40 NLS FRAP (2025-21 14) 

M E D P }c \c \c 9 \c y/ L E F h W H F . . . 
20 CCGCGGCCACCATGCTCGACCCTAAGAAGAAGAGAAAGGTACTCGA66A6ATGTGGCATGAA. . . 
SacII IX/S) Xhol 

rp&Pi^n9';-9nq i vPifimi.-^-aQO) . vPifif4l3-49 Ql — 

25 RISKQ VOAPPTD OEYGGVD 

CGAATCTCAAAGCAGGTCGAGGCCCCCCCGACCGAT. . . GACGAGTACGGTGGGGTCGAC 
(S/X) Soil 

Hft (f l.»\tnr| _ ^ ^ ^ 

30 YPYDVPOYAEOEnd 

TATCCGTACGACGTACCAGACTACGCACTCGACTAA6AATTC 

U/S) EcoRl 



35 



54 



wo 9600951 



PCT/DS9S/16982 



Example 9: Constructs for Chimeric Proteins Containing Altemativ Composite DNA-binding 
Regions 

The following DNA vectors were prepared containing recombinant DNA seqiiences encoding 
component DNA binding stibdomains and composite DNA binding regions containing diem. 

5 

Constructs. 

All plasmids are constructed in pET-19BHA, a pET-19B based vector modified such that all 
expressed proteins contain an amino-tenninal Histidine Tag* for purification and an epitope tag 
for immunoprecipitation. pET-19B is a well-known vector for expression of heterologous proteins 
10 in E coli or in reticulocjrte lysates. 

Zinc Finger Constructs 

All 23DC finger sequences arc derived from the human cDNA encoding SRE-ZBP (Attar, R.M. 
15 and Gilman, MZ. 1992. MCB 12: 2432-2443). 

pl9B2F: Contains SREIIBP zinc fingers 6 and 7 (amino adds 328 to 410) fused in frame to the 
epitope tag in pl9BHA. DNA encoding ZBP zinc fingers 6 and 7 was generated by FCR using 
primers 2F-Xba5' and ZNF-Spe/Bam (see l^eiow). The resulting fragment was cut with Xbal and 
20 BamHl and ligated between the Xbal and BamHI sites of pET-19BHA. 

pl9B4F: Contains SREZBP zinc fingers 4, 5, 6 and 7 (amino acids 300 to 410) fused in frame to 
the epitope tag in pl9BHA. A DNA fragment encoding ZBP zinc fingers 4, 5, 6 and 7 was 
gerierated by PCR using primers 4F-Xba5* and ZNF-Spe/Baxn. The restilting fragment was 
25 with Xbal and BamHI and ligated between the Xbal and BamHI sites of pET-19BHA. 

pl9B7F: Contains SREZBP zinc fingers 1 to 7 (amino adds 216 to 410) fused in frame to the 
epitope tag in pl9BHA. DNA encoding ZBP zinc fingers 1 to 7 was generated by PCR using 
primers 7F-Xba5* aitd ZNF-Spe/Bam. The resulting fragment was cut with Xbal and BamHI and 
30 ligated l>etween the Xbal and BamHI sites of pET-19BHA. 

pl9BFl: Contains SREZBP zinc finger 1 (amino adds 204 to 241) fused in frame to the epitope 
tag in pl9BHA* DNA encoding ZBP zinc finger 1 was generated by PCR using primers ZBP2T15* 

55 



W096m9Sl PCr/US95a69« 

and ZBPZF13*. The resulting fragment was cut with Xbal and BaxnHI and ligated between th 
Xbal and BaxnHI sites of pET-19BHA. 

pl9BF123: Contains SREZBP zinc fingers 1, 2 and 3 (amino adds 204 to 297) fused in frame to the 
5 epitope tag in pl9BHA. DNA encoding ZBP zinc fingers 1, 2 and 3 was generated by PCR using 
primers ZBPZF15' and ZBPZF33\ The resulting fragment was cut with Xbal and BamHI and 
ligated between the Xbal and BamW sites of pET-19BHA. 

Homeodomain Construct 

10 

pl9BHH: Contains the Phoxl homeodomain and flanking amino adds (amino adds 43 to 150 
(Grucneberg et aL 199Z Sdence. 257: 1089-1095)) fused in frame to the epitope tag in pl9BHA. 
DNA encoding the Phoxl fragment was generated by PCR using primers Phox HH5* Primer and 
Phox HH Spe/Bam. The resulting fragment was cut with Xbal and BamHI and ligated between 
15 the Xbal and BamHI sites of pET-19BHA. 

Zinc Fingci/Homeodomain Constructs 

pl9B2FHH: Contains SREZBP zinc fingers 6 and 7 (amino adds 328 to 410) fused in frame 

20 to the epitope tag in pl9BHA foUowed by the Phoxl homeodomain (amino adds 43 to 150). An 
Xbal-BamM fragment from pl9BHH containing sequences encoding the Phoxl homeodomain was 
ligated between the Spel and BamHI sites of pl9B2F. 

pl9B4FHH: Contains SREZBP zinc fingers 4, 5, 6 and 7 (amino adds 300 to 410) fused in 

25 frame to the epitope teg in pl9BHA f oUowed by the Phoxl homeodomain (amino adds 43 to 
150), An Xbal-BamHl fragment from pl9BHH containing sequences encoding the Phoxl 
homeodomain was Ugated between the Spel and BamHI sites of pl9B4F. 

pl9B7FHH: Contains SREZBP zinc fingers 1 to 7 (amino adds 216 to 410) fused in frame to 

30 the epitope tag in pl9BHA foUowed by the Phoxl homeodomain (amino adds 43 to 150). An 
Xbal-BamHI fragment from pl9BHH conteining sequences encoding the Phoxl homeodomain was 
ligated between the Spel and BamHI sites of pl9B7F. 



56 



W 96/20951 



PCr/US95a6982 



pl9BZFlHH: Contains SREZDP zinc finger 1 (amino adds 204 1 241) fused in frame to die 
epitope tag in pl9BHA followed by the Phoxl homeodomain (amin adds 43 to 150). An Xbal- 
BamHI tog;ment from pl9BHH containing sequences encoding the Phoxl homeodomain was 
ligated between the Spel and BamHI sites of pl9BZFl. 

5 

pl9BZF123HH: Contains SREZBP zinc finger 1, 2 and 3 (amino adds 204 to 297) fused in frame to 
the epitope tag in pl9BHA followed by the Phoxl homeodomain (amino adds 43 to 150). An 
Xbal-BamHI fragment from pl9BHH contaiiung sequences encoding the Phoxl homeodomain was 
ligated between the Spel and BamHI sites of pl9B2P123. 

10 

Homeodomain/Zinc Finger constructs 

pl9BHH2F: Contains Phoxl homeodomain (amino adds 43 to 150) fused in frame to the 

epitope tag in pl9BHA followed by ZBP zinc fingers 6 and 7 (amino adds 328 to 410). An Xbal- 
15 BamHI fragment from pl9B2F containing sequences encoding ZBP zinc fingers 6 and 7 was ligated 
between the Spel and BamHI sites of pl9BHtt 

pl9BHH4F: Contains Phoxl homeodomain (amino adds 43 to 150) fused in frame to the 

epitope tag in pl9BHA followed by ZBP zinc fingers 4, 5, 6 and 7 (amino adds 300 to 410). An 
20 Xbal-BamHI fragment from pl9B4F containing sequences encoding ZBP zinc fingers 4, 5, 6 and 7 
was ligated between the Spel and BamHI sites of pl9BHH. 

pl9BHH7F: Contains Phoxl homeodomain (amino adds 43 to 150) fused in frame to the 

epitope tag in pl9BHA followed by ZBP zinc fingers 1 to 7 (amino adds 216 to 410). An Xbal* 
25 BamHI fragment from pl9B7F containing sequences encoding ZBP zinc fingers 1 to 7 was ligated 
between the Spel and BamHI sites of pl9BHH. 

pl9BHHZFl: Contains Phoxl homeodomain (amino adds 43 to 150) fused in frame to the 
epitope tag in pl9BHA followed by 2:BP zinc finger 1 (amino adds 204 to 241). An Xbal-BamHI 
30 fragment from pl9B2IFl containing sequences encoding ZBP zinc finger 1 was ligated between the 
Spel and BamHI sites of pl9BHH. 

pl9BHHZF123:Contains Phoxl homeodomain (amino adds 43 to 150) fused in frame to the 
epitope tag in pl9BHA followed by ZBP zinc fingers 1, 2 and 3 (amino adds 204 to 297). An Xbal- 



57 



W 96/20951 



PCTAJS9S/16982 



BamHI fragment from pl9BZF123 containing sequences encoding ZBP zinc fingos 1, 2 and 3 was 
ligatcd between the Spel and BamHI sites of pl9BHR 

pCRPrimos: 



15 



SRE-ZBP 
2F-Xba5': 
10 ar-XboS*: 
7F-Xba5': 
ZNF-Spe/Bom: 
ZBPZF15': 
ZBPZF13': 
20 ZBPZF33': 

PH0X1 

25 Phox HH 5* Primer: 

5-.TCAGTCTAGAGGCCGGAGCCTGCTGGAGT-3 

Phox HH Sp?{^°5IgGGATCCTCAATAACTAGTGTAGGATTTGAGGAGGGAA-3 

30 



5*-TCAGTCTA6ATGTAACATATGCCAGAAAGCCTTC-3' 
5-TCAGTCTAGATGCAAGGAGTGTGGAAAAACCTTT-3* 
5 • -TC AGTCTAGATGTCATGAGTGTGGGAAAGCCTTT-3 * 
5 * -TCAGGGATCCTCAAT AACTA6TAGCCAGTTTGTCTTTGTGGTGATA-3 * 
5-TCAGTCTAGACATAAGAAAGTCCTCTCTAG-3* 
5* -TCAGGGATCCTCTATATCAACTAGTAGGCTTCTCACCAAGATGG-3 * 
5' -TCAGGGATCCTCTATATCAACTAGTGGGCTCCTCCTGACTGTG-3* 



Equivalents 

The invention disclosed herein is of broad appUcability and is susceptible to many useful 
variations within the context described and iUustratcd herein. Those skilled in the art will 
recogruze or be able to ascertain from the foregoing disclsoure, using no more than routine 
experimentation, many valuable equivalents to the specific embodiments of the invention 
described herein. Such equivalents are intended to be encompassed by the foUowing claims, 



58 



WO9d/20951 



MICR ORGANISMS 



PCT/US9S/16982 



A. toifrnncAT«o« or ocroMT • 



American Type Culture Collection 



12301 Parklavn Drive 
Rockville, Maryland 20852 OSA 



Mane of 

pCGKN ZFHD1-FKBPX3 



nmtmrrmd to on 

6/27-28 
A5/23, 27 
46/10 
48/13 
61/30 



OatA of 

Deposit 

12/28/95 



C OISICMATYO STATCS ro« WMICN IMOICATIOHS AAt MAOI i |lf th« HiAtCMMft* M Ml t*ff att States) 



HCNCATIOMS • (lM«« MMi« il iMl «nlici»m 



t. Q Tfin Maal was racawatf ««m ma t«tam«<taA«i aa»iicaua«« wAan Ma« (ta ka macaa* ftr 'aciwag OAca} 



1 



rn Tha aaia af f 



<AMW»f a a< CKkafl 
I (tr«Mi ma a»ai««ni| ftr tiia <M*afMatia<ul ftwraaa 



(Awmantatf OAcarl 



fafm PCT;R0.114 (jaM*arr tMt| 



59 



WO!My20951 PCT/US95/16982 

Claims 

The invention claimed is: 

1. A chimeric protein which 

( a ) selectively binds a DNA sequence with a Kd value of about 10* or better, and 

(b) contains at least one composite DNA-binding region comprising a continuous 
polypeptide chain containing two or more component polypeptide domains, at least 
two of which are mutually heterologous. 

2. A chimeric protein of Claim 1 which selectively binds a DNA sequence spanning at least 10 
base pairs, 

3. A chimeric protein of Claim 1, wherein at least one of the component polypeptide domains is 
a homeodomairu 

4. A chimeric protein of Qaim 3, wherein the homeodomain is the Ocl-1 homeodomain. 

5. A chimeric protein of Claim 1, wherein at least one of the component polypeptide domains is 
a zinc finger domain. 

6. A chimeric protein of Claim 5, wherein the zinc finger domain is finger 1 or finger 2 of Zif268. 

7. A chimeric protein of Qaim 1 comprising a composite DNA-binding region containing a 
homeodomain covalently linked to at least one zinc finger domain. 

8. A chimeric protein of Claim 7 comprising the Oct-l homeodomain covalendy linked to zinc 
finger 1 and/or zinc finger 2 of Zif268. 

9. A chimeric protein comprising the peptide sequence of ZFHDl. 

10. A chimeric protein of any of Qaims 1-9, which further contains at least one additional 
domain comprising a transcription activation domain, a transcription repressing domain, or a 
DNA-cleaving domain. 



60 



wo 9^00951 



PCTAIS95/16982 



11. A transcription factor for activating the expression of a target gene in a cell which comprises 
a chimeric protein f Cairn 10 i^4uch (a) contains at least one transcription activation 
domain and (b) is capable of binding to a DNA sequence linked to the target gene. 

12. A transcription factor of Claim 11 in which the activation domain is the Herpes Simplex 
Vims VP16 activation domain. 

13. A transcription factor of Claim 11 which comprises a composite DNA47inding region 
containing at least one homeodomain and at least one zinc finger domain. 

14. A transcripti<m factor of Cairn 11 which comprises the peptide sequence of ZFHDl. 

15. A transcription factor for repressing the expression of a target gene in a cell which comprises 
a chimeric protein of Cairn 10 which (a) contains at least one transcription repressing 
domain and (b) is capable of binding to a DNA sequence linked to the target gene. 

16. A transcription repressing factor of Claim 15 in which the transcription repressing domain is 
a Krab domain. 

17. A chimeric protein for cleaving a target DNA sequence which comprises a chimeric protein of 
Claim 10 in which the additional domain is capable of cleaving DNA, the chimeric protein 
being capable of binding to a DNA sequence Unked to the target gene. 

18. A chimeric cleavage protein of Claim 17, wherein the DNA cleavage domain is the Fokl 
cleavage domain. 

19. A DNA sequence encoding a chimeric protein of any of Claims 1-18. 

20. A DNA sequence of Caim 19 which encodes a chimeric protein comprising a composite DNA- 
binding region containing a homeodomain covalently linked to at least one zinc finger 
domain. 

21. A DNA sequence of Caim 19 which encodes a chimeric protein comprising the peptide 
sequence of ZFHDl. 



61 



wo96;aa9Si 



PCTAJS95a6m 



22. A DNA sequence of Claim 19 which encodes a chimeric protein comprising a composite DN A- 
binding region and a transcripti n activation domain. 

23. A DNA sequence of Claim 19 which encodes a chimeric protein comprising a composite DNA- 
binding region and a transcription repressing domain. 

24. A DNA sequence of Claim 19 which encodes a dumeric protein comprising a composite DNA- 
binding region and a domain capable of cleaving DNA. 

25. A eukatyotic expression construct comprising a DNA sequence of any of Claims 19-24 operably 
linked to expression control elements permitting gene expression in eukaryotic cells. 

26. A eukaryotic expression construct comprising a DNA sequence encoding a transcription 
activating factor of Claim 11 operably linked to expression control elements permitting gene 
expression in eukaryotic cells. 

27. A eukaryotic expression construct comprising a DNA sequeiKe encoding a transcription 
repressing factor of Claim 15 operably linked to expression control elements permitting gene 
expression in eukaryotic cells. 

28. A eukaryotic expression construct comprising a DNA sequence encoding a DNA-cleaving 
chimeric protein of Claim 17 operably linked to expression control elements permitting gene 
expression in eukaryotic cells. 

29. A eukaryotic expression construct of any of Claims 25 • 28 wherein the expression control 
elelemnts include an inducible promoter permitting regulated expression of the DNA 
encoding the chimeric protein. 

30. A eukaryotic expression construct of Qaim 25 comprising pCGNN ZFHD1-FKBPX3 (ATCC 
No. ) 

31. A method for geneticaUy engineering cells to express a chimeric protein of any of Claims M8 
comprising the steps of: 



62 



WO9da0951 



PCT/DS9S/169S2 



providing an expression construct of any of Qaims 25-30 for directing the expression 
in a eukaryotic cell of a chimeric protein which is capable of binding to a DNA 
sequence linked to a target gene» and, 

introducing the expression expression construct into the ceils in a manner permitting 
expression of the introduced DNA in at least a portion of the cells. 

32. A method of Claim 31 wherein the target gene and the DNA sequence linked thereto are 
endogenous to the cells. 

33. A method of Qaim 31 which further comprises the step of introducing into the cells a DNA 
sequence containing a target gene and the DNA sequence to which the chimeric protein is 
capable of binding.. 

34. A method of Claim 31 in which the cells into which the DNA is introduced are maintained 
in culture. 

35. A method of any of Claim 31 * 34 in which the cells are present within an organism. 

36. A method any of Claims 31-35 wherein the chimeric protein is a trarxscription factor capable - 
of binding to a DNA sequence linked to the target gene and activating transcription of the 
target gene. 

37. A method of any of Qaims 31-35 wherein the chimeric protein is a transcription factor 
capable of binding to a DNA sequence linked to the target gene and repressing transcription of 
the target gene. 

38. A method of any of Claims 31-35 wherein tiie chimeric protein is capable of binding to a 
DNA sequence and cleaving DNA lirUced to that sequence. 

39. Genetically engineered cells containing and capable of expressing a DNA sequence encoding a 
chimeric protein of any of Claims 1-18. 

40. Genetically engineered cells of Claim 39 which further contain a DNA sequence to which the 
encoded chimeric protein is capable of binding. 



(a) 
(b) 



63 



wo 96^0951 



PCr/US9S/169g2 



41. Genetically engineered cells of Claim 39 in which the chimeric protein activates the 
transcription of a gene linked to a DNA sequence to whidi the chimeric protein binds. 

42- Genetically engineered cells of Claim 39 in which the chimeric protein represses the 
transcription of a gene Unked to a DNA sequence to which the chimeric protein binds. 

43. Genetically engineered cells of Qaim 39 in which the chimeric protein binds to a DNA 
molecule and cleaves it 

44. A non-human organism containing genetically engineered cells of any of Claims 39 - 43. 

45. A method of expressing a target gene in a cell comprising the steps of: 

( a ) providing cells containing and capable of expressing a first DNA sequence encoding i 
transcription factor of Qaim 11 which is capable of binding to a second DNA 
sequence linked to a target gene of interest also present within the cells; and 

(b) maintaining the cells under conditions pennitting gene expression and protein 
production . 

46. A method of repressing the expression of a target gene in a cell comprising the steps of: 

( a ) providing cells containing and capable of expressing a first DNA sequence encoding 
chimeric protein of Claim 1 which is capable of 

( i ) binding to a second DNA sequence linked to a target gene of interest also 
present within the cells, and 

(ii) repressing the transcription of the target gene; and, 

(b) maintaining the cells under conditior\s permitting gene expression and protein 
production. 

47. A method of cleaving a target DNA sequence in a cell comprising the steps of: 

(a) providing cells containing and capable of expressing a first DNA sequence encoding 
chimeric protein of Claim 17 which is capable of 

( i) binding to a second DNA sequence linked to a target DNA sequence also 
present within cells, and 

(ii) cleaving the target DNA sequence; and. 



64 



wo 96/20951 



PCTAIS9S/16982 



(b) maintaining the cells irnder conditions pennitting gene expression and protein 
production. 

48. A method of any of Claims 45 - 47, wherein expression of the first DNA sequence is controlled 
by an inducible promoter, and the conditions pennitting gene expression and protein 
production pennit expression of the first DNA sequence. 

49. A method of identifying a DNA sequence in a mixture, comprising tfie steps of: 

( a ) providing a nuxture containing one or more DNA sequences; 

(b) contacting the mixture with a chimeric protein of Claim 1 imder conditions 
pennitting the specific binding of a DNA-binding protein to a DNA sequence; and, 

(c) determining the occurrence, amount and/or location of any DNA binding by the 
chimeric protein. 

50. A method of Claim 49, wherein the chimeric protein is labelled with a detectable label 
and /or a moiety permitting recovery from the mixture of the chimeric protein with any bound 
DNA. 

51. A method of Claim 49 which further comprises the step of recovering from the mixture the ' 
chimeric protein and an boimd DNA. 



65 



W096tt(»5I 



PCTAJS9S/169R2 



1/6 



FIG. lA 



ZFHD1 

N- l GST- FINGER 1 FINGER 2- HOMeST -C 



^RTHTG GGRR RKKRT^ 



FIG. IB 6TIZGGC3UC:CTGACT&21ZTZ2U^GGA6 

CICGGCC6TTa]mSU5G6GTGTIC6 

Ta2maTGGGC6GGATCGA2^XAGCC 

6GCC6ZACCXCATSaAA!TX2lGGG6C6 

GTCGGGCTCTSTZaaZZAZGGGIGG 

GG2UIA2mSUCGGGT6GCATTTJU3GC 
G2mkA2mUSG6GC6TCXCATCCC6T 
ZAAATZaGGGCTTTAIUXIACGGT^ 



FIG. IC 



3' 



HD 

TAAT 

ATTA 


TA 
AT 


ZF2 

N6G 

NCC 


ZF1 

GN6 

CNC 



3' 



5* TA ATTA 



3' AT 



TAAT 



OH 



2F2 


ZFl 


N6G 


GNG 


NCC 


CNC 



3' 

sr 



M0DE1 



MODE 2 



FIGURE I 



wo 96/20951 



PCTAJS95/16982 



2/6 



FIG. 2A 

2FHD1- 



AT6CAAATGA TCATTATGGGCG TAATGATG6GCG GCGTrOGGCG 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 



FIG. 2B 

OcM POU- 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 



TIG. 2C 

2lf268- 



1 2 3 4 5 6 7 8 8 10 11 12 13 14 15 16 17 18 19 20 



FIGURE 2 



wo 96/20951 



3/6 



PCT/US95/169S2 




»«un dV3S 



W 96/20951 



PCTAJS95/16982 



4/6 




wo 96/20951 



5/6 



PCTA7S9S/1698Z 




wo ^720951 



6/6 



PCT/nS95/16982 




(|iu/6u) HOM ujnias 



INTERNATIONAL SEARCH REPORT 



IntcroatioMl appUcatkm No. 
PCTAJS95/16982 



A. CLASSinCATION OF SUBJECT MATTER 

IPC{6) iCOTK 14AX>; C12N 15/00; C12P 21/00; AOIK 67/00 
US CL :530a50; 536/23.1; 435/69.1. 320.1: 800^ 
Aeooiding to Intcreation ri Patcot Cto—ificalion (IPC) or to both ptionai cUMifiction and IPC ^ 

B. HELPS SEARCHED , 

Minimum documcotatson learchod (ctaaaification lyitcm foUowed by cUisification symbols) 

VS, : 530/350; 536/23.1; 435/69.1, 320.1; 800/2 

DocumcnUtioa tcaicbcd other than minimum documcnution to the extent that such documcnu are incliuicd in the fields searched 



Electronic dau bue consulted during the mtemationa] search (name of daU base and, where pmcticable, search tenns used) 
APS. MEDUNE, EMBASE, BIOSIS. CAPLUS 



C DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



Y, P 



Y, P 



POMERANTZ et al. Structure-Based Design of Transcription 
Factors. Science. 06 January 1995, Vol.267, pages 93-96. 
see entire document. 

POMERANTZ et al. Analysis of homeodomain function by 
structure-based design of a transcription factor. Proceedings 
of the National Academy of Sciences USA. October 1995, 
Vol. 92, pages 9752-9756, see entire document. 



1-18, 26-28, 
45-51 



1-18, 26-28, 
45-51 



fx] Further documenU ar« listed in the eonlinuaiion of Box C. Q Sec patent family annex. 




cwMMl ^WMd after inlcfMtMMl rOiDt ^ 



*E* mjIjii <uL iBP^|ii *r ^ rTt n mrrr'*-' ' ' ■ ■•■■■' «c-f^ 

twkidk mmf Arv» 4mitai ca pnoriqr cU^O or wUdi b 
^Uk pybUfiion <m of aaedMr diBtaoa m «Oicr 



Wo^ofcwiMSttoi 




Date of the actual completion of the international search 
13 MAY 1996 


Date of mailing of the intcroational search report 

06JUN1996 


Name and mailing address of the ISAAiS 
Coowmsskmer of Ptieau and Trmdeourfcs 

Box per 

Wishiaitoo, D.C. 20231 
Facsimile No. a03) 305-3230 


Authoruffd^flker ^^-^^o 
^D^^RTTSHOOT 

Telephone No. (703) 308-0196 ^ 



Form PCT/ISAniO (second shee«KJuly 1992)* 



INTERNATIONAL SEARCH REPORT 



Interfutional mpplicalion No. 
PCT/US95/16982 



C (Cominwtion). DOCUMENTS CONSIDERED TO BE RELEVANT 




Caiegory* 


CiUlion of document, with indtealioii. where appropriale. of the relevant passages 


RdevsMt lo elaim No. 


Y 


POMERANTZ et al. Recognition of the surface of a homeo 
domain protein. Genes & Development. 1992, Vol.6, pages 
2047-2057, see entire document. 


3-4, 7-8, 13 


Y 


JAMIESON et al. In Vitro Selection of Zinc Fingers with Altered 
3NA-Binding Specificity. Biochemistry. 1994, Vol.33, No. 19. 
ages 5689-5695, see entire document 


5-8. 13 


Y 


MARGOLIN et al. Kruppel-associated boxes are potent 
transcriptional repression domains. Proceedings of the National 
Academy of Sciences USA. May 1994, Vol.91, pages 4509-4513, 
see entire document. 


16 


Y 


KIM et al. Chimeric restriction endonuclease. Proceedings of the 
^Iational Academy of Sciences USA. February 1994. Vol.91, 
lages 883-887, see entire document. 


10, 17-18 


Y 


FRA -IKEL et al. Fingering Too Many Proteins. Cell. June 3, 
1988, Vol.53, page 675, see entire document. 


5-8, 13 


Y 


TfflESENetal. THE KRAB DOMAIN, A 
TRANSCRIPTIONAL SILENCER PRESENT IN MORE THAN 
HUNDRED HUMAN KRUPPEL-TYPE ZINC FINGER 
PROTEINS. Experimental Hematology. 1995, Vol.23, No.8, 
page 779, Abstract No. 137. 


16 


Y 


POMERANTZ et al. STRUCTURE-BASED DESIGN OF 
TRANSCRIPTION FACTORS WITH NOVEL DNA-BINDING 
SPECIFICITIES. Journal of Cellular Biochemistry Supplement. 
1995, Vol. 21A, page 382, Abstract No. C6-232. 


1-18, 26-28, 45- 
51 


Y 


SHARP etal. REGULATION OF TRANSCRIPTION BY 
COMPLEXES OF PROTHNS. Journal of Cellular Biochemistry. 
1993, Vol.17, Part D, page 2, Abstract No. M002. 


1-18, 26-28, 45- 
51 



Form PCT/lSA/210 (conliniulion of iccood thoctXiuly 1992)* 



INTERNATIONAL SEARCH REPORT 



Intenuttonal Application No. 
PCTA;S95/16982 



Box 1 Obscrratjoni where ccruia claims were found unsearchable (ContSnnatioa of Item 1 of first sheet) 
This international itport has not been established m respect of certain claims under Article l7aX*) for the fi>Oowing reasons: 

I, |~| Claims Nos.: 

' because they reUie to subject matter not required to be searched by this Authority, namely: 



2. |~] Claims Nos.: 

' because they relaie to parts of the tntemational applieatbn that do not comply with the prescribed requiremenU to such 
an extent that no meaningful international search can be carried out, specifically. 



3. fxl Claims Nos.: 19-25 and 29-44 

— because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4<a). 



Box II Obserrations where unity of lOTcntkMi k Ucking (Continuat ion of item 2 of first sheet) 

This IntemaUonaJ Searching Authority found multiple inventions in this imemalional appGcalion, as follows: 



t . rn As aU required additional search fees were timely paid by the applicant, this international search report covers all searchable 
claims. 

2, As all searchable claims could be searched without effort justifying an additionsl fee, this Authority did not invite payment 
of any additional fee. 

3. I I As only some of the tequ'ued additional search fees were timely paid by the applicant, this international search report coven 
— only those claims for which fees were paid, spcciiicaUy cUtms Nos.: 



4, I I No required additional search foes were timely paid by the appticanL Consequently. Ihu international search report U 
— restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 



Remark on Protest Q The additional search fees were accompanied by the applicant's protest. 

{ I No protest accompanied the payroenC of additional search fees. 



Form PCT/lSA/210 (continuation of first sheet(l)XJuly 1992)* 



